X,Y
[Pr{x=x1},Pr{x=x2}]
[0.29, 0.71]
[Pr{x=x1|y=y1}, Pr{x=x2|y=y1}]
[0.16, 0.84]
X: x1,x2,…,xk
H(x)= -p1log(p1)- p2*log(p2)- … -pxlog(px)
-kΣi=1 pi*log(Pi)
H(x) = -Σi Pilog(Pi)
H(X) = 0 [0, 0, 0, … 1, 0, .., 0]
H(X) at max:[1/k, 1/k …]
H(X) > H(X|Y=Y1)
H(X) – H(X|Y)
X A,B,C,… arg min [H(X) > H(X|v)]
Covariance between intertweet and mention distance
mentionDists = [[v[0]] for v in nearestMentionToTimeDiff] intertweetTimes = [v[l] for v in nearestMentionToTimeDiff] import sklearn from sklearn import linear_model from sklearn.linear_model import LinearRegression clf = linear_model.LinearRegression() clf.fit(mentionDists, intertweetTimes) clf.coef_ mentionDists = [v[0] for v in nearestMentionToTimeDiff] nyplt.scatter(mentionDists[0:6], intertweetTimes[0:6])