Planning under uncertainty

Planning under uncertainty and learning
MDP, POMDPs

deterministic, stochastic

fully observable A*, depth filter, deapth first, mdp
partially observable, POMDP

Markov decision process(MDP)
state, actions, state transition,
T(s,a,s’)
reword function R(s)

MDP cridworld
policy π(s)->A

tree too deep

stole
blanching factor large
many states visitied more than once

Planning and Execution

Stochastic
Multi agent
Partial serviceability [A, S, F, B]
– unknown
– hierarchical

[s, r, s][s, while a:r, s]

[a, s, f] result(result(a, a->s), s->f) <- goals s' = result + (s, a) b' = update(redirect(b, a), 0) classical planning state space: k-boolean(2k) world state: complete assignment belief state: complete assignment, partial assignment, arbitrary formula Action(fly(p, x, y)) prerecord : plan(p)^ airport(x) ^ airport(y) ^ a + (p, x) effect: ¬a+(p,x) ^ A +(p, y) at(D, sfo) at(c, sfo) load(c, d1, sfo) Regression vs Progression Action(buy(b),effect:ISBN(b), eff:own(b)) goal(own(0136042597)) situation calculus actions: objects fly(p, x, y) situation: objects successor-state axioms A +(p,x,s)

Representation with logic

propositional logic
(E V B) => A
A => (J A M)
J <=> M
J <=> ¬M
{B true, E false}

Truth Table

O P O=> P
(E V B) => A
A => (J ^ M)

first-order logic rel, object, func T/F/?
propositional logic facts T/F/?
probability theory facts [0..1]
atomic -> problem solving
factored
structured
{P:T, Q:F}

Syntax
-sentences terms
vowel(A)
above(A, B)
2 = 2
operators A v ¬ => <= ( ) terms A, B, 2 x, y number of A quantifiers: vowel(x) => number of (x) = 1
Number of (x) = 2

Maximum likelihood

Maximum likelihood
3, 4, 5, 6, 7
m = 5
μ = 5
σ2 = 2

3, 9, 9, 3
μ = 6
σ2 = 9

Gaussians
– functional form
– fit from data
– multivariate gaussians

Expectation maximization
P(x) = Σi=i k P(c=i)p(x|C=i)
πi μiΣi

EM versus K-mean

minimize: -Σj log p(xjlσΣ1k)+ cosf k
guess
run EM
remove

clustering
– k-means, em

Unsupervised learning

Unsupervised learning
-constructure

density estimation
-clustering
-dimensionality reduction
blind separation

K Means Clustering
– need to know k
– local minimum
– high dimentionality
– lack of mathematics

Gaussian Learning
pacamakes of a gaussian
f(x1u102)=1/√2πΘ exp(x-μ)2/2α2

μ=1/m mΣj=1 xg

Data x1…xm p(x1…xm|μ1Θ2)=πi f(xi|μ1Θ2)=(1/2πΘ2)m/2 exp – Σπ(xi-μ)2/2α2
m/2 log 1/2πα2 – 1/2α2 mΣi=i(xi-μ)2

minimize more complicated loss function

Gradient
L = Σj(yj – w1x0 – w0)2 ->min
ΘL/w1 = -2Σj(yj-w1xj-w0)xj
ΘL/w0 = -2Σj(yj-w1xj-w0)

Perception algorithm
Linear seperator
w1x + w0 >= 0
0 if w1x + w0 < 0 Linear function Linear Method -regression vs classification -exact solution vs iterative solution -smoothing -non-linear problems Supervised Learning -> parametic

KNN definition
learning: memorize all data

Problems of KNN
-very large data set
kdd trees
-very large feature spaces

Quadratic Loss

Minimize quadratic loss
minΣ(yi-w1xi-w0)2 = L
ΘL/Θw0
Σxiyi – 1/m ΣyiΣxi – w/m(Σxi)2 = w1Σxi2

f(x)= w1X + w0
w0 = 3
w1 = -1

Sum(x_i y_i) – (1/M) Sum(y_i) Sum(x_i) + (w_1/M)( Sum(x_i) )^2 = w_1 Sum(x_i^2)

Regularization
loss = loss(data)+loss(parameters)
Σj(yi-wixi-w0)^2 + Σi|wi|p

Advanced spam filters

advanced spam filters
-know spamming ip?
-have you emailed reason before?
-have other people received same message?
-email header consistent
-all caps
-do inline urls point to where they say?
-are you addressed by name?

Digit recognition
-input vector = pixel values
16 x 16

over fitting prevention
-Occam’s razor k?
cross validation

supervised learning
->classification yie{0,1}
->regression yie[0,1] eR

f(x) = w1X + w0
w0 = 3, w1= -1

Linear Regression
Data f(x)=w1x + w0, f(x)=wx+w0
y = f(x)
Loss = Σj(yj-x1xg-w0)2

Relationship to bayes network

Bayes Network
-offer is secret click sports
p(“secret”|spam) = 1/3

Dictionary has words
p(spam) ~ 1
p(wi|spam) ~ 11
p(wi|ham) ~ 11

message m=”sports”
p(spam|m) = 0.1667 or 3/18
= p(m|spam) p(spam) / P(m|spam)p(spam)+(m|ham)p(ham)

m = “secret is secret”
p(spam | m) = 25 /26

laplace smoothing
ml p(x) = count(x)/n

LS(k) p(x) = count(x) + k / (n + k|x|)
1 message 1 spam p(spam) = 2/3
10 message 6 spam p(spam) = 7/12
100 message 60 spam p(spam) = 61/102

k = 1, p(spam) = 2/5 p(ham) = 3/5 p(“today”|spam) = 1/21 p(“today”|ham) = 3/27

M = “today is secret” P(spam|m)= 0.4858

summary naive bayes