Reinforcement Learning
Published:
Reinforcement Learning
Finite Markov Game
Setup:
- finite state space $\mathcal{S}$
- finite action space $\mathcal{A}$
- Transition model $\mathbb{P}\in \mathbb{R}^{S\cdot A\times A}$ where $\mathbb{P}(s’\vert s’,a)$ is the probability of transitioning into state $s’$ upon taking action $a$ in state s.
- Reward function: $r: \mathcal{S}\times\mathcal{A}\to[-1,1]$.
- Discount factor $\gamma\in [0,1)$.
- $\phi$ is the initial state distribution
Proximal Policy Optimization


RL as Supervised learning
A blog “Reinforcement learning is supervised learning on optimized data” from BAIR lab is a good reference. They discussed the connection between Dynamic programming(Policy learning), Optimization(Q-learning, TD learning) and Supervised learning(Optimization on policy and data).
