Question

I recently just started learning reinforcement learning and learned that reinforcement learning algorithms work under the assumption of MDP or POMDP. However as I read A3C and recent vision based deep RL papers, it seems some of them aren't assuming MDPs but used RNNs or LSTM to make it seem as if it is MDP.

So my question is: how does reinforcement learning algorithms work without the assumption of (PO)MDPs?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top