Does policy optimization learn policies to make better actions with higher probability? [closed]

Question

When I talk about policy optimization, it is referred to the following picture, and it is linked to DFO/Evolution plus Policy Gradients.

I would like to know is it correct to say: Policy Optimization learns policies to make better actions with higher probability?

Also, what is the location of Proximal Policy Optimization in the picture?

No correct solution

Licensed under: CC-BY-SA with attribution