## Supervised Learning

Supervised learning
x1, x2, x3 … xn -> y1
x21 x22 x23 … x2n -> y2
->Xm

f(xm) = ym

OCCAM’S RAZOR
everything eles begins equal
choose the less complex hypothesis
fit <-> low complexity
generalization error and over fitting error.

SPAM, HAM
offer is secret, play sports today
click secret link, went play sports
secret sports link, secret sport event, sport is today, sport costs money

P(spam) = 3/8

Maximum likehood
ssshhhhh p(s)=π
p(yi) = {π if yi = s, 1-π if yi = h
p(yi) = πyi ・(1-π)1-yi
p(date) = πi=1p(yi)= πcount(yi=1) (1-π)count(yi=0)=π3・(1-π)5

MLSolutions for
P(“secret” | spam) = 1/3
P(“secret” | ham) = 1/15

## Machine Learning

Machine learning
-> Bayes networks = reason with know models
-> Machine learning = learn models from data

Supervised Learning
Unsupervised Learning

Famous for using machine learning
-netflix DVD recommendations
-Amazon Product placement

Machine Learning
what?
->parameters, structure, hidden concepts
what form?
->supervised unsupervised reinforcement
what for?
-> production diagnostics sumarization…
How?
-> passive, active, online, offline
output?
-> classification, regression
Details
->generative, distinguish

## Approximate Inference Sampling

Approximate Inference Sampling
P(B|+a)
Likelihood weighting
Inconsistent

GIBBS Sampling
Markov chain monte carlo mcmc
+c +s -r -w
+c -s -l -w
+c -s +r -w

P(A) = 0.5, P(B|A)=0.2, P(B|¬A)=0.8
P(¬A)=1-P(A)=0.5
P(A|B)=0.2
P(A|B)=P(B|A)P(A)/P(B)=0.2*0.5/P(B|A)P(A)+P(B|¬A)P(¬A)
=0.1/(0.1+0.8+0.5)=0.2

Simple Bayes Net
P(A)=0.5, Vi:P(Xi|A)=0.2, P(Xi|¬A)=0.6
P(A|X1 X2 ¬X3)=P(¬X3|A)P(A|X1X2)
P(¬A|x1X2¬X3)αP(¬X3hA)P(x2|¬A) P(x1|¬A)P(¬A)

## Probabilistic Inference

Probabilistic Inference
-Probability theory
-Bayes network
-Independence
-Inference

Evidence -> Hidden -> Query
P(Q1,Q2…|E1=e1, E2=e2)

Enumeration
P(+b|+j,+m)=P(+b,+j,+m)/P(+j,+m)
P(+b,+j,+m)=ΣeΣaP(+b,+j,+m,e,a)

Pulling out terms
P(+b)ΣeΣaP(e)P(a|+b,e)P(+j|a)P(+m,a)

Maximize independence
O(n), O(2n)

Variable Elimination

R->T->L
P(+l)=ΣrΣt P(r)R(t|r)P(+l|t)

## General Bayes Network

P(A),P(B)
P(C|A,B)
P(D|C),P(E|C)
P(A,B,C,D,E) = P(A)P(B)P(C|A,B)P(D|C)P(E|C)

Bayes Networks
-Graph structure
-Conpact representation
-Conditional independence

## Different type of bayes network

Different type of bayes network
P(S)= 0.7
P(R)= 0.01
P(H|S,R) = 1
P(H|¬S,R) = 0.9
P(H|S,¬R) = 0.7
P(H|¬S,¬R) = 0.1
P(R|S)=0.01
P(R|H1 S)=0.0142
= P(H|R1 S)*P(R|S)/P(H|S) = P(H|R1S)*P(R)/P(H|R1S)P(R)+ P(H|TR1S)P(¬R)
P(R|H) = P(H|R)P(R)/P(H)= 0.97*0.01/0.5245 = 0.0185
P(R|H,¬S) = P(H|R,¬S)P(R|¬S)/P(H|¬S) = 0.009/0.9*0.01+0.1*0.99=0.0083

## Computing Bayes Rule

Bays rule
P(A|B) = P(B(A))P(A)/P(B)
P(¬A|B) = P(B|¬A)P(¬A)/P(B)
P(A|B)+P(¬A|B) = 1

P'(A|B) = P(B|A)P(A)
P'(¬A|B) = P(B|¬A)P(¬A)

P(C)=0.01
P(+|c)=0.9
P(-|+c)=0.8
P(¬C)=0.99
P(-|C)=0.1
P(+|+C)=0.2

Conditionally Independent
P(T2|C1 T1) = P(T2|C)

P(T2 =+1 | T1 =t)
= P(+2 | +1 ,C) P(C|+1)+P(+2|+,¬C)P(¬C|+1)
= P(+2|C)*0.043 + P(+2|¬C)*0.957
= 0.2301

## Weather

P(D1) P(D1=sunny)=0.9
P(D2=sunny | D1=sunny) = 0.8
P(D2 = rainy|D1=sunny) = 　0.2
P(D2 = sunny|D1 = rainy) = 0.6
P(D2 = rainy|D1 = rainy) = 0.4

Bayes Rule
P(A|B) = P(B|A)* P(A)/ P(B)
Posterior = Likelihood * prior / marginal likekihood

P(c|+) = P(+|c)*P(c)/ P(+) = 0.9 * 0.01/ 0.9*0.01 + 0.2*0.99

bays rule
A: not observable P(A)
B: observable P(B|A), P(B|¬A)
Diagnostic reasoning: P(A|B), P(B|¬A)

## Dependence

P(x1 = H) = 1/2 -> H:P(x2 = H |x1 = H) = 0.9
T:P(x2 = T |x1 = T) = 0.8
P(x2 = H) = 0.55
P(x2:h) = P(x2 = H| x1 = H)*P(x1 = h)+ P(x2=H | x1 = T)*P(x1= T)

P(Y) = Σi P(Y|X=i) P(x=i) :total probability

## Probability in AI

Bayes network

altenator broken, fanbelt broken ->
battery dead -> battery flat -> car won’t start
-battery meter, battery age
light, oil light, gas gague
no oil, no gas, fuel line blocked, starter broken

Binary events
Probability
Simple bayes networks
Conditional independence
Bayes networks
D-seperation
Parameter counts

Bayes networks -> diagnostics, prediction, machine learning