PCA vs ICA

bss +ica -pca
directional +ica -pca
faces
natural sene ->ica <-edge documents->topics

information theory
x1, x2, x3 -> learner -> y

010101
A 0, B 110, C 111, D 10

Joint Entropy
H(x,y)= -ΣP(x,y)logP(x,y)

Conditional Entropy
Hcy(x) = -Σp(x,y)logP(y|x)

H(y|x)
I(x,y)=H(y)-H(x|y)

D(p||g) = sp(x)log p(x)/q(x)

k-means clustering

k-means clustering
-pick k centers(at random)
-each center “claims” its closest points
-recompute the centers by overaging the clustered points
-repeat until convergence

P+(x): Partition / cluster of object x
c+ : set of all points in cluster i = {x s.t. P(x)=i}
center+i = ΣyeCi y / |Ci|

P+(x) = argmin||x-center+-1||2

K-means as optimization
configurations -center, P
scores – E(P,center) = Σx||centerp(x)-x||22
neighborhood-p,center= {(p’, center)}U {(P, center’)}

Properties of k-means clustering
-each iteration polynomial o(k n)
-finite(exponential) iterations o(kn)
-error decreases(if ties broken consistently)[with one exception]
-can get stuck!

Practical Matters

-mimic does well with structure
-representing PΘ Vo
-local optima
-probability theory
-time complexity

Clustering & EM
unsupervised learning
– supervised learning use labeled training data to generalize labels to new instances
– unsupervised learning make sense out of unlabeled data

Basic clustering problem
given: set of objects X
inter-object distances D(x,y) = D(y,x) x, yeX
output: partition pd(x) = pd(y)
if x &y in same cluster