NLTK has a built-in pretrained sentiment analyzer, VADER(Valence Aware Dictionary and sEntiment Reasoner)
import nltk
from pprint import pprint
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
pprint(sia.polarity_scores("Wow, NLTK is really powerful!"))
$ python3 app.py
{‘compound’: 0.8012, ‘neg’: 0.0, ‘neu’: 0.295, ‘pos’: 0.705}
compoundはaverageで-1から1までを示す
twitter corpus
tweets = [t.replace("://", "//") for t in nltk.corpus.twitter_samples.strings()]
def is_positive(tweet: str) -> bool:
"""True if tweet has positive compound sentiment, False otherwise."""
return sia.polarity_scores(tweet)["compound"] > 0
shuffle(tweets)
for tweet in tweets[:10]:
print(">", is_positive(tweet), tweet)
$ python3 app.py
> False Most Tory voters not concerned which benefits Tories will cut. Benefits don’t figure in the lives if most Tory voters. #Labour #NHS #carers
> False .@uberuk you cancelled my ice cream uber order. Everyone else in the office got it but me. 🙁
> False oh no i’m too early 🙁
> False I don’t know what I’m doing for #BlockJam at all since my schedule’s just whacked right now 🙁
> False What should i do .
BAD VS PARTY AGAIN :(((((((
> True @Shadypenguinn take care! 🙂
> True Thanks to amazing 4000 Followers on Instagram
If you´re not among them yet,
feel free to connect :-)… http//t.co/ILy03AtJ83
> False RT @mac123_m: Ed Miliband has spelt it out again. No deals with the SNP.
There’s a choice:
Vote SNP get Tories
Vote LAB and get LAB http//…
> True @gus33000 but Disk Management is same since NT4 iirc 😀
Also, what UX refinements were in zdps?
> False RT @KevinJPringle: One of many bizarre things about @Ed_Miliband’s anti-SNP stance is he doesn’t reject deal with LibDems, who imposed aust…
postivie_review_ids = nltk.corpus.movie_reviews.fileids(categories=["pos"])
negative_review_ids = nltk.corpus.movie_reviews.fileids(categories=["neg"])
all_review_ids = positive_review_ids + negative_review_ids
def is_positive(review_id: str) -> bool:
"""True if the average of all sentence compound scores is positive. """
text = nltk.corpus.movie_reviews.raw(review_id)
scores = [
sia.polarity_scores(sentence)["compound"]
for sentence in nltk.sent_tokenize(text)
]
return mean(scores) > 0
shuffle(all_review_ids)
correct = 0
for review_id in all_review_ids:
if is_positive(review_id):
if review in positive_review_ids:
correct += 1
else:
if review in negative_review_ids:
correct += 1
print(F"{correct / len(all_review_ids):.2%} correct")
既にcorpusがあるのは良いですね。