Question

I'm currently reading Hands on Machine Learning with Scikit-Learn & Tensorflow, and I'm wondering why does Q-learning require an actor model and a critic model to learn?

On page 465, it states:

As we will see, the training algorithm we will use requires two DQNs with the same archicture (but different parameters): one will be used to drive Ms. Pac-Man during training (the actor), and the other will watch the actor and learn from its trials and errors (the critic).

Is this a typical Q-learning implementation? If not, what is?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top