$ pip3 install nltk
### 分かち書き
nltk.download('punkt')
### 品詞の取得
import nltk nltk.download('averaged_perceptron_tagger')
# -*- coding:utf-8 -*- import nltk nltk.download('punkt') s = "The Brooklyn Nets appeared to be well on their way to taking a 3-0 series lead over the Boston Celtics Friday night, as they erupted out of the TD Garden gates on a 19-4 run in the first four minutes of action." morph = nltk.word_tokenize(s) print(morph)
$ python3 app.py
[nltk_data] Downloading package punkt to /home/vagrant/nltk_data…
[nltk_data] Package punkt is already up-to-date!
[‘The’, ‘Brooklyn’, ‘Nets’, ‘appeared’, ‘to’, ‘be’, ‘well’, ‘on’, ‘their’, ‘way’, ‘to’, ‘taking’, ‘a’, ‘3-0’, ‘series’, ‘lead’, ‘over’, ‘the’, ‘Boston’, ‘Celtics’, ‘Friday’, ‘night’, ‘,’, ‘as’, ‘they’, ‘erupted’, ‘out’, ‘of’, ‘the’, ‘TD’, ‘Garden’, ‘gates’, ‘on’, ‘a’, ’19-4′, ‘run’, ‘in’, ‘the’, ‘first’, ‘four’, ‘minutes’, ‘of’, ‘action’, ‘.’]
うおおおおおおおおおおおおおお
ガチSugeeeeeeeeeeeeee
import nltk nltk.download('punkt') nltk.download('averaged_perceptron_tagger') s = "The Brooklyn Nets appeared to be well on their way to taking a 3-0 series lead over the Boston Celtics Friday night, as they erupted out of the TD Garden gates on a 19-4 run in the first four minutes of action." morph = nltk.word_tokenize(s) pos = nltk.pos_tag(morph) print(pos)
$ python3 app.py
[nltk_data] Downloading package punkt to /home/vagrant/nltk_data…
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /home/vagrant/nltk_data…
[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
[(‘The’, ‘DT’), (‘Brooklyn’, ‘NNP’), (‘Nets’, ‘NNPS’), (‘appeared’, ‘VBD’), (‘to’, ‘TO’), (‘be’, ‘VB’), (‘well’, ‘RB’), (‘on’, ‘IN’), (‘their’, ‘PRP$’), (‘way’, ‘NN’), (‘to’, ‘TO’), (‘taking’, ‘VBG’), (‘a’, ‘DT’), (‘3-0’, ‘JJ’), (‘series’, ‘NN’), (‘lead’, ‘NN’), (‘over’, ‘IN’), (‘the’, ‘DT’), (‘Boston’, ‘NNP’), (‘Celtics’, ‘NNPS’), (‘Friday’, ‘NNP’), (‘night’, ‘NN’), (‘,’, ‘,’), (‘as’, ‘IN’), (‘they’, ‘PRP’), (‘erupted’, ‘VBD’), (‘out’, ‘IN’), (‘of’, ‘IN’), (‘the’, ‘DT’), (‘TD’, ‘NNP’), (‘Garden’, ‘NNP’), (‘gates’, ‘NNS’), (‘on’, ‘IN’), (‘a’, ‘DT’), (’19-4′, ‘JJ’), (‘run’, ‘NN’), (‘in’, ‘IN’), (‘the’, ‘DT’), (‘first’, ‘JJ’), (‘four’, ‘CD’), (‘minutes’, ‘NNS’), (‘of’, ‘IN’), (‘action’, ‘NN’), (‘.’, ‘.’)]
英語は誰でも意味わかるからな。
英語じゃなくて、中国語で分かち書きをやりたいな。