Supervised Learning

Supervised learning
x1, x2, x3 … xn -> y1
x21 x22 x23 … x2n -> y2
->Xm

f(xm) = ym

OCCAM’S RAZOR
everything eles begins equal
choose the less complex hypothesis
fit <-> low complexity
generalization error and over fitting error.

SPAM, HAM
offer is secret, play sports today
click secret link, went play sports
secret sports link, secret sport event, sport is today, sport costs money

P(spam) = 3/8

Maximum likehood
ssshhhhh p(s)=π
p(yi) = {π if yi = s, 1-π if yi = h
p(yi) = πyi ・(1-π)1-yi
p(date) = πi=1p(yi)= πcount(yi=1) (1-π)count(yi=0)=π3・(1-π)5

MLSolutions for
P(“secret” | spam) = 1/3
P(“secret” | ham) = 1/15

Machine Learning

Machine learning
-> Bayes networks = reason with know models
-> Machine learning = learn models from data

Supervised Learning
Unsupervised Learning

Famous for using machine learning
-google web mining
-netflix DVD recommendations
-Amazon Product placement

Machine Learning
what?
->parameters, structure, hidden concepts
what form?
->supervised unsupervised reinforcement
what for?
-> production diagnostics sumarization…
How?
-> passive, active, online, offline
output?
-> classification, regression
Details
->generative, distinguish

Approximate Inference Sampling

Approximate Inference Sampling
P(B|+a)
Likelihood weighting
Inconsistent

GIBBS Sampling
Markov chain monte carlo mcmc
+c +s -r -w
+c -s -l -w
+c -s +r -w

P(A) = 0.5, P(B|A)=0.2, P(B|¬A)=0.8
P(¬A)=1-P(A)=0.5
P(A|B)=0.2
P(A|B)=P(B|A)P(A)/P(B)=0.2*0.5/P(B|A)P(A)+P(B|¬A)P(¬A)
=0.1/(0.1+0.8+0.5)=0.2

Simple Bayes Net
P(A)=0.5, Vi:P(Xi|A)=0.2, P(Xi|¬A)=0.6
P(A|X1 X2 ¬X3)=P(¬X3|A)P(A|X1X2)
P(¬A|x1X2¬X3)αP(¬X3hA)P(x2|¬A) P(x1|¬A)P(¬A)

Probabilistic Inference

Probabilistic Inference
-Probability theory
-Bayes network
-Independence
-Inference

Evidence -> Hidden -> Query
P(Q1,Q2…|E1=e1, E2=e2)

Enumeration
P(+b|+j,+m)=P(+b,+j,+m)/P(+j,+m)
P(+b,+j,+m)=ΣeΣaP(+b,+j,+m,e,a)

Pulling out terms
P(+b)ΣeΣaP(e)P(a|+b,e)P(+j|a)P(+m,a)

Maximize independence
O(n), O(2n)

Variable Elimination

R->T->L
P(+l)=ΣrΣt P(r)R(t|r)P(+l|t)

Different type of bayes network

Different type of bayes network
P(S)= 0.7
P(R)= 0.01
P(H|S,R) = 1
P(H|¬S,R) = 0.9
P(H|S,¬R) = 0.7
P(H|¬S,¬R) = 0.1
P(R|S)=0.01
P(R|H1 S)=0.0142
= P(H|R1 S)*P(R|S)/P(H|S) = P(H|R1S)*P(R)/P(H|R1S)P(R)+ P(H|TR1S)P(¬R)
P(R|H) = P(H|R)P(R)/P(H)= 0.97*0.01/0.5245 = 0.0185
P(R|H,¬S) = P(H|R,¬S)P(R|¬S)/P(H|¬S) = 0.009/0.9*0.01+0.1*0.99=0.0083

Computing Bayes Rule

Bays rule
P(A|B) = P(B(A))P(A)/P(B)
P(¬A|B) = P(B|¬A)P(¬A)/P(B)
P(A|B)+P(¬A|B) = 1

P'(A|B) = P(B|A)P(A)
P'(¬A|B) = P(B|¬A)P(¬A)

P(C)=0.01
P(+|c)=0.9
P(-|+c)=0.8
P(¬C)=0.99
P(-|C)=0.1
P(+|+C)=0.2

Conditionally Independent
P(T2|C1 T1) = P(T2|C)

P(T2 =+1 | T1 =t)
= P(+2 | +1 ,C) P(C|+1)+P(+2|+,¬C)P(¬C|+1)
= P(+2|C)*0.043 + P(+2|¬C)*0.957
= 0.2301

Weather

P(D1) P(D1=sunny)=0.9
P(D2=sunny | D1=sunny) = 0.8
P(D2 = rainy|D1=sunny) =  0.2
P(D2 = sunny|D1 = rainy) = 0.6
P(D2 = rainy|D1 = rainy) = 0.4

Bayes Rule
P(A|B) = P(B|A)* P(A)/ P(B)
Posterior = Likelihood * prior / marginal likekihood

P(c|+) = P(+|c)*P(c)/ P(+) = 0.9 * 0.01/ 0.9*0.01 + 0.2*0.99

bays rule
A: not observable P(A)
B: observable P(B|A), P(B|¬A)
Diagnostic reasoning: P(A|B), P(B|¬A)

Probability in AI

Bayes network

altenator broken, fanbelt broken ->
battery dead -> battery flat -> car won’t start
-battery meter, battery age
light, oil light, gas gague
no oil, no gas, fuel line blocked, starter broken

Binary events
Probability
Simple bayes networks
Conditional independence
Bayes networks
D-seperation
Parameter counts

Bayes networks -> diagnostics, prediction, machine learning
Finance, Google, Robotics
particle filters, HMM, MDP + POMDPs, KALMAN filters …

Probabilities is certainty in AI
P(head) = 1/2, P(Tail) = 1/2
P(h, h, h) = 1/8, P(h) = 1/2
P(x1=x2=x3=x4)=0.125,
P({x1,x2,x3,x4} contains >= 3 h) = 5 / 16