Does policy optimization learn policies to make better actions with higher probability? [closed]
-
01-11-2019 - |
Question
When I talk about policy optimization, it is referred to the following picture, and it is linked to DFO/Evolution plus Policy Gradients.
I would like to know is it correct to say: Policy Optimization learns policies to make better actions with higher probability?
Also, what is the location of Proximal Policy Optimization in the picture?
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange