Planning under uncertainty and learning
MDP, POMDPs
deterministic, stochastic
fully observable A*, depth filter, deapth first, mdp
partially observable, POMDP
Markov decision process(MDP)
state, actions, state transition,
T(s,a,s’)
reword function R(s)
MDP cridworld
policy π(s)->A
tree too deep
stole
blanching factor large
many states visitied more than once