[NLTK] sentiment analysis – ソフトウェアエンジニアの技術ブログ：Software engineer tech blog

NLTK has a built-in pretrained sentiment analyzer, VADER(Valence Aware Dictionary and sEntiment Reasoner)

import nltk
from pprint import pprint
from nltk.sentiment import SentimentIntensityAnalyzer

sia = SentimentIntensityAnalyzer()
pprint(sia.polarity_scores("Wow, NLTK is really powerful!"))

$ python3 app.py
{‘compound’: 0.8012, ‘neg’: 0.0, ‘neu’: 0.295, ‘pos’: 0.705}

compoundはaverageで-1から1までを示す

twitter corpus

tweets = [t.replace("://", "//") for t in nltk.corpus.twitter_samples.strings()]

def is_positive(tweet: str) -> bool:
    """True if tweet has positive compound sentiment, False otherwise."""
    return sia.polarity_scores(tweet)["compound"] > 0

shuffle(tweets)
for tweet in tweets[:10]:
    print(">", is_positive(tweet), tweet)

$ python3 app.py
> False Most Tory voters not concerned which benefits Tories will cut. Benefits don’t figure in the lives if most Tory voters. #Labour #NHS #carers
> False .@uberuk you cancelled my ice cream uber order. Everyone else in the office got it but me. 🙁
> False oh no i’m too early 🙁
> False I don’t know what I’m doing for #BlockJam at all since my schedule’s just whacked right now 🙁
> False What should i do .

BAD VS PARTY AGAIN :(((((((
> True @Shadypenguinn take care! 🙂
> True Thanks to amazing 4000 Followers on Instagram
If you´re not among them yet,
feel free to connect :-)… http//t.co/ILy03AtJ83
> False RT @mac123_m: Ed Miliband has spelt it out again. No deals with the SNP.
There’s a choice:
Vote SNP get Tories
Vote LAB and get LAB http//…
> True @gus33000 but Disk Management is same since NT4 iirc 😀
Also, what UX refinements were in zdps?
> False RT @KevinJPringle: One of many bizarre things about @Ed_Miliband’s anti-SNP stance is he doesn’t reject deal with LibDems, who imposed aust…

postivie_review_ids = nltk.corpus.movie_reviews.fileids(categories=["pos"])
negative_review_ids = nltk.corpus.movie_reviews.fileids(categories=["neg"])
all_review_ids = positive_review_ids + negative_review_ids

def is_positive(review_id: str) -> bool:
	"""True if the average of all sentence compound scores is positive. """
	text = nltk.corpus.movie_reviews.raw(review_id)
	scores = [
		sia.polarity_scores(sentence)["compound"]
		for sentence in nltk.sent_tokenize(text)
	]
	return mean(scores) > 0

shuffle(all_review_ids)
correct = 0
for review_id in all_review_ids:
	if is_positive(review_id):
		if review in positive_review_ids:
			correct += 1
	else:
		if review in negative_review_ids:
			correct += 1

print(F"{correct / len(all_review_ids):.2%} correct")

既にcorpusがあるのは良いですね。