amzn2でNLTKを動かす

$ pip3 install nltk

### 分かち書き

nltk.download('punkt')

### 品詞の取得

import nltk
nltk.download('averaged_perceptron_tagger')
# -*- coding:utf-8 -*-

import nltk
nltk.download('punkt')
s = "The Brooklyn Nets appeared to be well on their way to taking a 3-0 series lead over the Boston Celtics Friday night, as they erupted out of the TD Garden gates on a 19-4 run in the first four minutes of action."
morph = nltk.word_tokenize(s)
print(morph)

$ python3 app.py
[nltk_data] Downloading package punkt to /home/vagrant/nltk_data…
[nltk_data] Package punkt is already up-to-date!
[‘The’, ‘Brooklyn’, ‘Nets’, ‘appeared’, ‘to’, ‘be’, ‘well’, ‘on’, ‘their’, ‘way’, ‘to’, ‘taking’, ‘a’, ‘3-0’, ‘series’, ‘lead’, ‘over’, ‘the’, ‘Boston’, ‘Celtics’, ‘Friday’, ‘night’, ‘,’, ‘as’, ‘they’, ‘erupted’, ‘out’, ‘of’, ‘the’, ‘TD’, ‘Garden’, ‘gates’, ‘on’, ‘a’, ’19-4′, ‘run’, ‘in’, ‘the’, ‘first’, ‘four’, ‘minutes’, ‘of’, ‘action’, ‘.’]

うおおおおおおおおおおおおおお
ガチSugeeeeeeeeeeeeee

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
s = "The Brooklyn Nets appeared to be well on their way to taking a 3-0 series lead over the Boston Celtics Friday night, as they erupted out of the TD Garden gates on a 19-4 run in the first four minutes of action."
morph = nltk.word_tokenize(s)
pos = nltk.pos_tag(morph)
print(pos)

$ python3 app.py
[nltk_data] Downloading package punkt to /home/vagrant/nltk_data…
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /home/vagrant/nltk_data…
[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
[(‘The’, ‘DT’), (‘Brooklyn’, ‘NNP’), (‘Nets’, ‘NNPS’), (‘appeared’, ‘VBD’), (‘to’, ‘TO’), (‘be’, ‘VB’), (‘well’, ‘RB’), (‘on’, ‘IN’), (‘their’, ‘PRP$’), (‘way’, ‘NN’), (‘to’, ‘TO’), (‘taking’, ‘VBG’), (‘a’, ‘DT’), (‘3-0’, ‘JJ’), (‘series’, ‘NN’), (‘lead’, ‘NN’), (‘over’, ‘IN’), (‘the’, ‘DT’), (‘Boston’, ‘NNP’), (‘Celtics’, ‘NNPS’), (‘Friday’, ‘NNP’), (‘night’, ‘NN’), (‘,’, ‘,’), (‘as’, ‘IN’), (‘they’, ‘PRP’), (‘erupted’, ‘VBD’), (‘out’, ‘IN’), (‘of’, ‘IN’), (‘the’, ‘DT’), (‘TD’, ‘NNP’), (‘Garden’, ‘NNP’), (‘gates’, ‘NNS’), (‘on’, ‘IN’), (‘a’, ‘DT’), (’19-4′, ‘JJ’), (‘run’, ‘NN’), (‘in’, ‘IN’), (‘the’, ‘DT’), (‘first’, ‘JJ’), (‘four’, ‘CD’), (‘minutes’, ‘NNS’), (‘of’, ‘IN’), (‘action’, ‘NN’), (‘.’, ‘.’)]

英語は誰でも意味わかるからな。
英語じゃなくて、中国語で分かち書きをやりたいな。