Reinforcement Learning

less than 1 minute read

Published: June 28, 2023

Reinforcement Learning

Finite Markov Game

Setup:

finite state space $\mathcal{S}$
finite action space $\mathcal{A}$
Transition model $\mathbb{P}\in \mathbb{R}^{S\cdot A\times A}$ where $\mathbb{P}(s’\vert s’,a)$ is the probability of transitioning into state $s’$ upon taking action $a$ in state s.
Reward function: $r: \mathcal{S}\times\mathcal{A}\to[-1,1]$.
Discount factor $\gamma\in [0,1)$.
$\phi$ is the initial state distribution

Proximal Policy Optimization

RL as Supervised learning

A blog “Reinforcement learning is supervised learning on optimized data” from BAIR lab is a good reference. They discussed the connection between Dynamic programming(Policy learning), Optimization(Q-learning, TD learning) and Supervised learning(Optimization on policy and data).

Share on

Twitter Facebook LinkedIn

Hanpu Shen

Reinforcement Learning

Reinforcement Learning

Finite Markov Game

Proximal Policy Optimization

RL as Supervised learning

Share on

You May Also Enjoy

Qualify-review

Qualify Exam Review:

Reproducing-Kernel-Hilbert-Space

Comments on the dp-SGD

Comments on the dp-SGD