supervised (x1, y1)(x2, y2) … y = f(x)
unsupervised x1, x2, … P(X = x)
Reinforcement s,a,s,a..
sur
speech recognition
star data
lever pressig
MDP Review – Markov Decision Processes
s E S
a E Actions(s)
ソフトウェアエンジニアの技術ブログ:Software engineer tech blog
随机应变 ABCD: Always Be Coding and … : хороший
supervised (x1, y1)(x2, y2) … y = f(x)
unsupervised x1, x2, … P(X = x)
Reinforcement s,a,s,a..
sur
speech recognition
star data
lever pressig
MDP Review – Markov Decision Processes
s E S
a E Actions(s)
R(s) -> +100, -100, -3
E[∞Σt=0 γtRt] -> max
value iteration
V(a3, E) = 0.8×100 -3 = 77
V(s) <- [max aγΣs'P(s'(s,a)V(s'))]+ R(s)
back-up theorem convercet