Text Learning

Learning from TEXT

– Nice day
– A very nice day
-> SVM -> {o, x}

input dimension for svm

BAG OF WORDS, just frequency count
nice:1, very:0, day:1, he:0, she:0, love:0
Mr day loves a nice day
nice:1, very:0, day:2, he:0, she:0, love:1

from nltk.corpus import stopwords
nltk.download()
sw = stopwords.words("english")
sw[0]
sw[10]
len(sw)

Vocabulary: Not all unique words are different
unresponsive, response, responsivity, responsiveness, respond