MDP and Costs R(s) -> +100, -100, -3 E[∞Σt=0 γtRt] -> max value iteration V(a3, E) = 0.8×100 -3 = 77 V(s) <- [max aγΣs'P(s'(s,a)V(s'))]+ R(s) back-up theorem convercet