NLTK has a built-in pretrained sentiment analyzer, VADER(Valence Aware Dictionary and sEntiment Reasoner)
import nltk from pprint import pprint from nltk.sentiment import SentimentIntensityAnalyzer sia = SentimentIntensityAnalyzer() pprint(sia.polarity_scores("Wow, NLTK is really powerful!"))
$ python3 app.py
{‘compound’: 0.8012, ‘neg’: 0.0, ‘neu’: 0.295, ‘pos’: 0.705}
compoundはaverageで-1から1までを示す
twitter corpus
tweets = [t.replace("://", "//") for t in nltk.corpus.twitter_samples.strings()] def is_positive(tweet: str) -> bool: """True if tweet has positive compound sentiment, False otherwise.""" return sia.polarity_scores(tweet)["compound"] > 0 shuffle(tweets) for tweet in tweets[:10]: print(">", is_positive(tweet), tweet)
$ python3 app.py
> False Most Tory voters not concerned which benefits Tories will cut. Benefits don’t figure in the lives if most Tory voters. #Labour #NHS #carers
> False .@uberuk you cancelled my ice cream uber order. Everyone else in the office got it but me. 🙁
> False oh no i’m too early 🙁
> False I don’t know what I’m doing for #BlockJam at all since my schedule’s just whacked right now 🙁
> False What should i do .
BAD VS PARTY AGAIN :(((((((
> True @Shadypenguinn take care! 🙂
> True Thanks to amazing 4000 Followers on Instagram
If you´re not among them yet,
feel free to connect :-)… http//t.co/ILy03AtJ83
> False RT @mac123_m: Ed Miliband has spelt it out again. No deals with the SNP.
There’s a choice:
Vote SNP get Tories
Vote LAB and get LAB http//…
> True @gus33000 but Disk Management is same since NT4 iirc 😀
Also, what UX refinements were in zdps?
> False RT @KevinJPringle: One of many bizarre things about @Ed_Miliband’s anti-SNP stance is he doesn’t reject deal with LibDems, who imposed aust…
postivie_review_ids = nltk.corpus.movie_reviews.fileids(categories=["pos"]) negative_review_ids = nltk.corpus.movie_reviews.fileids(categories=["neg"]) all_review_ids = positive_review_ids + negative_review_ids def is_positive(review_id: str) -> bool: """True if the average of all sentence compound scores is positive. """ text = nltk.corpus.movie_reviews.raw(review_id) scores = [ sia.polarity_scores(sentence)["compound"] for sentence in nltk.sent_tokenize(text) ] return mean(scores) > 0 shuffle(all_review_ids) correct = 0 for review_id in all_review_ids: if is_positive(review_id): if review in positive_review_ids: correct += 1 else: if review in negative_review_ids: correct += 1 print(F"{correct / len(all_review_ids):.2%} correct")
既にcorpusがあるのは良いですね。