technology

MDPs
POMDPs, Belief Space
Reinforcement Learning
A*; h function; Monte Carlo

chess, go, robot soccer, poker, hide-and-go-seek, card soliaire, minesweeper

s, p, actions(s, p), result(s,a), terminal(s), u(s, p)

deterministic, two-player, zero-sum

def maxValue(s):
m = -∞
for (a, s) in successors(s):
v = value(s’)
m = max(m, v)
return m

complexity
o(b)m

HMMs and Filters

Hidden Markov Model -HMMs
– analyse
– to predict time sence

Applications
– roboitcs
– medical
– finance
– speech
– language technology

HMMs follow bayes network
s1 -> s2 -> s3 -> ..Sn Markov chain
z1 z2 z3 z4

kalman filter, particle filter

localization problem
razor finder

speech recognition -> markov model
transition “I” to “a”

Hidden markov chain
P(R0)= 1
R(s0) = 0
p(R1) = 0.6
p(R2) = 0.44
P(R2) = 0.376

P(A1000)
P(A∞)lim t100 P(At)

stationary distribution
P(at) = P(at-1)
p(at|at-1)p(at-1) + P(at|t-1)P(bt-1)

Planning under uncertainty

Planning under uncertainty and learning
MDP, POMDPs

deterministic, stochastic

fully observable A*, depth filter, deapth first, mdp
partially observable, POMDP

Markov decision process(MDP)
state, actions, state transition,
T(s,a,s’)
reword function R(s)

MDP cridworld
policy π(s)->A

tree too deep

stole
blanching factor large
many states visitied more than once

Planning and Execution

Stochastic
Multi agent
Partial serviceability [A, S, F, B]
– unknown
– hierarchical

[s, r, s][s, while a:r, s]

[a, s, f] result(result(a, a->s), s->f) <- goals s' = result + (s, a) b' = update(redirect(b, a), 0) classical planning state space: k-boolean(2k) world state: complete assignment belief state: complete assignment, partial assignment, arbitrary formula Action(fly(p, x, y)) prerecord : plan(p)^ airport(x) ^ airport(y) ^ a + (p, x) effect: ¬a+(p,x) ^ A +(p, y) at(D, sfo) at(c, sfo) load(c, d1, sfo) Regression vs Progression Action(buy(b),effect:ISBN(b), eff:own(b)) goal(own(0136042597)) situation calculus actions: objects fly(p, x, y) situation: objects successor-state axioms A +(p,x,s)

Representation with logic

propositional logic
(E V B) => A
A => (J A M)
J <=> M
J <=> ¬M
{B true, E false}

Truth Table

O P O=> P
(E V B) => A
A => (J ^ M)

first-order logic rel, object, func T/F/?
propositional logic facts T/F/?
probability theory facts [0..1]
atomic -> problem solving
factored
structured
{P:T, Q:F}

Syntax
-sentences terms
vowel(A)
above(A, B)
2 = 2
operators A v ¬ => <= ( ) terms A, B, 2 x, y number of A quantifiers: vowel(x) => number of (x) = 1
Number of (x) = 2

Maximum likelihood

Maximum likelihood
3, 4, 5, 6, 7
m = 5
μ = 5
σ2 = 2

3, 9, 9, 3
μ = 6
σ2 = 9

Gaussians
– functional form
– fit from data
– multivariate gaussians

Expectation maximization
P(x) = Σi=i k P(c=i)p(x|C=i)
πi μiΣi

EM versus K-mean

minimize: -Σj log p(xjlσΣ1k)+ cosf k
guess
run EM
remove

clustering
– k-means, em

Unsupervised learning

Unsupervised learning
-constructure

density estimation
-clustering
-dimensionality reduction
blind separation

K Means Clustering
– need to know k
– local minimum
– high dimentionality
– lack of mathematics

Gaussian Learning
pacamakes of a gaussian
f(x1u102)=1/√2πΘ exp(x-μ)2/2α2

μ=1/m mΣj=1 xg

Data x1…xm p(x1…xm|μ1Θ2)=πi f(xi|μ1Θ2)=(1/2πΘ2)m/2 exp – Σπ(xi-μ)2/2α2
m/2 log 1/2πα2 – 1/2α2 mΣi=i(xi-μ)2