fastTextで英文のジャンル分類(Text Classification)

### Getting and preparing data
$ wget https://dl.fbaipublicfiles.com/fasttext/data/cooking.stackexchange.tar.gz && tar xvzf cooking.stackexchange.tar.gz
$ ls
cooking.stackexchange.id
$ head cooking.stackexchange.txt
__label__sauce __label__cheese How much does potato starch affect a cheese sauce recipe?
__label__food-safety __label__acidity Dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove How do I cover up the white spots on my cast iron stove?

“__label__” prefix is how fasttext recognize difference of word and label.

$ wc cooking.stackexchange.txt
15404 169582 1401900 cooking.stackexchange.txt
-> split example and validation
$ head -n 12404 cooking.stackexchange.txt > cooking.train
$ tail -n 3000 cooking.stackexchange.txt > cooking.valid

### train_supervised
training.py

import fasttext
model = fasttext.train_supervised(input="cooking.train")

$ python3 training.py
Read 0M words
Number of words: 14543
Number of labels: 735
Floating point exception

何故だ〜〜〜〜〜〜〜〜〜〜

Facebookが取り組む機械学習

Facebookが取り組む機械学習
→ 運営するSNS上のコンテンツを全て理解することを目標
 L 投稿のレコメンデーション
 L 顔や物体の検知
 L 翻訳
 L フェイクニュースの検知

こうやってみると、Facebookは奇抜さはないが、堅実なイメージだ。

facebook SDK with composer

composerを使ったfacebook sdk
https://developers.facebook.com/docs/php/gettingstarted

[vagrant@localhost facebook]$ curl -sS https://getcomposer.org/installer | php
All settings correct for using Composer
Downloading 1.2.2...

Composer successfully installed to: /home/vagrant/facebook/composer.phar
Use it: php composer.phar

続いて、composer.jsonを作成

{
  "require" : {
    "facebook/php-sdk-v4" : "~5.0"
  }
}

その後、php composer.phar installで完了です。

jsonにautoloadを追加

{
  "require" : {
    "facebook/php-sdk-v4" : "~5.0"
  },
  "autoload":{
    "psr-4": {
      "MyApp\\": "lib/"
    }
  }
}

コマンドライン

[vagrant@localhost facebook]$ php composer.phar dump-autoload