identify features with influencing relationship

X,Y
[Pr{x=x1},Pr{x=x2}]
[0.29, 0.71]
[Pr{x=x1|y=y1}, Pr{x=x2|y=y1}]
[0.16, 0.84]

X: x1,x2,…,xk
H(x)= -p1log(p1)- p2*log(p2)- … -pxlog(px)
-kΣi=1 pi*log(Pi)

H(x) = -Σi Pilog(Pi)
H(X) = 0 [0, 0, 0, … 1, 0, .., 0]
H(X) at max:[1/k, 1/k …]

H(X) > H(X|Y=Y1)
H(X) – H(X|Y)

X A,B,C,… arg min [H(X) > H(X|v)]

Covariance between intertweet and mention distance

mentionDists = [[v[0]] for v in nearestMentionToTimeDiff]
intertweetTimes = [v[l] for v in nearestMentionToTimeDiff]

import sklearn
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
clf = linear_model.LinearRegression()

clf.fit(mentionDists, intertweetTimes)
clf.coef_
mentionDists = [v[0] for v in nearestMentionToTimeDiff]
nyplt.scatter(mentionDists[0:6], intertweetTimes[0:6])