Language Model
-probabilistic
-word-based
-Learned
P(word1, word2…)
L = {s2, s2, …}
Logical trees hand-coded
P(w1, w2, w3…Wn) = p(w1:n) = πi P(wi|w1:i-1)
Markov assumption
P(wi|w1:i-1) = P(wi|wi-k:i-1)
stationarity assumption p(wi|wi-1) = p(wj|wj-1)
smoothing
classification, clustering, input correction, sentiment analysis, information retrieval, question answering, machine translation, speech recognition, driving a car autonomously
P(the), p(der), p(rba)
Naive Bayes
k-Nearest Neighbor
Support vector machines
logistic regression
sort command
gzip command
(echo ‘cat new EN|gzip|wc -c’ EN;\
echo ‘cat new DE|gzip|wc -c’ DE; \
echo ‘cat new AZ|gzip|wc -c’ AZ) \
|sort -n|head -1
S* = maxP(w|:n) = max√li p(wi|w1:i-1)
S* = maxP(wi)