Machine Learning is everywhere in silicon valley
-speech recognition
-self driving car
-age of big data
-optimize product
Category: Machine Learning
Normalize Distribution
x1, x2, x3, x4, x5
0.04, 0.12, 0.12, 0.04, 0.04 Σ=0.36
1/9, 1/3, 1/3, 1/9, 1/9 Σ=1
p(xi|Z)
sum of probability
p=[0.2,0.2,0.2,0.2] pHit = 0.6 pMiss = 0.2 p[0]=p[0]*pMiss p[1]=p[1]*pHit p[2]=p[2]*pHit p[3]=p[3]*pMiss p[4]=p[4]*pMiss print p print sum(p)
PCA vs ICA
bss +ica -pca
directional +ica -pca
faces
natural sene ->ica <-edge
documents->topics
information theory
x1, x2, x3 -> learner -> y
010101
A 0, B 110, C 111, D 10
Joint Entropy
H(x,y)= -ΣP(x,y)logP(x,y)
Conditional Entropy
Hcy(x) = -Σp(x,y)logP(y|x)
H(y|x)
I(x,y)=H(y)-H(x|y)
D(p||g) = sp(x)log p(x)/q(x)
Independent components anarysis
Independent components anarysis
-pca correlation, maximizing rariance
-ica independence
x1 x2 x3 -> y1 y2 .. yi
I(yi:yj)=0
I(y・x)=1
Feature transformation
Feature transformation
-information retrieval
features? words
->polysemous:car
->synonomy:car
data mining, LISP
principal components anarysis
-maximizes variance
global algorithm
-best reconstruction
min errors moving from N -> m dimensions
Expectation Maximization
E[Zij] = P(x=xi|μ-μj)/kΣi=1 P(x=xi|μ=μj)
Mj = ΣiE[Zij]xi/ΣiE[Zij]
Expectation(define z from μ), Maximization(define μ from z)
Clustering properties Pd<-clustering scheme ・Richness ・Scale-variance ・Consistency
k-means clustering
k-means clustering
-pick k centers(at random)
-each center “claims” its closest points
-recompute the centers by overaging the clustered points
-repeat until convergence
P+(x): Partition / cluster of object x
c+ : set of all points in cluster i = {x s.t. P(x)=i}
center+i = ΣyeCi y / |Ci|
P+(x) = argmin||x-center+-1||2
K-means as optimization
configurations -center, P
scores – E(P,center) = Σx||centerp(x)-x||22
neighborhood-p,center= {(p’, center)}U {(P, center’)}
Properties of k-means clustering
-each iteration polynomial o(k n)
-finite(exponential) iterations o(kn)
-error decreases(if ties broken consistently)[with one exception]
-can get stuck!
Single Linkage Clustering
-consider each object a cluster (n objects)
-define intercluster distance as the distance between the closest two points in the two clusters
-merge two closest clusters
-repeat n-k times to make n clusters
Practical Matters
-mimic does well with structure
-representing PΘ Vo
-local optima
-probability theory
-time complexity
Clustering & EM
unsupervised learning
– supervised learning use labeled training data to generalize labels to new instances
– unsupervised learning make sense out of unlabeled data
Basic clustering problem
given: set of objects X
inter-object distances D(x,y) = D(y,x) x, yeX
output: partition pd(x) = pd(y)
if x &y in same cluster
Finding Dependency Trees
Dkl(P||P^π)= Σp[lgp – lg^p]
= -h(p)+Σh(x1π(xi))
Jπ=Σh(xi|π(x))
J’ = ΣI(xi:π(xi))
Generate samples from p(x)Θ
set Θt+1 to rfth percentile
retain only those samples s.b. f(x)>= Θt
estimate p(x)Θbth
repeat