What is “experience replay” and what are its benefits?

https://datascience.stackexchange.com/questions/20535

30-10-2019
|

Question

I've been reading Google's DeepMind Atari paper and I'm trying to understand the concept of "experience replay". Experience replay comes up in a lot of other reinforcement learning papers (particularly, the AlphaGo paper), so I want to understand how it works. Below are some excerpts.

First, we used a biologically inspired mechanism termed experience replay that randomizes over the data, thereby removing correlations in the observation sequence and smoothing over changes in the data distribution.

The paper then elaborates as follows:

While other stable methods exist for training neural networks in the reinforcement learning setting, such as neural fitted Q-iteration, these methods involve the repeated training of networks de novo hundreds of iterations. Consequently, these methods, unlike our algorithm, are too inefficient to be used successfully with large neural networks. We parameterize an approximate value function $Q(s, a; \theta_i)$ using the deep convolutional neural network shown in Fig. 1, in which $\theta_i$ are the parameters (that is, weights) of the Q-network at iteration $i$. To perform experience replay, we store the agent's experiences $e_t = (s_t, a_t, r_t, s_{t+1})$ at each time-step $t$ in a data set $D_t = \{e_1, \dots, e_t \}$. During learning, we apply Q-learning updates, on samples (or mini-batches) of experience $(s, a, r, s') \sim U(D)$, drawn uniformly at random from the pool of stored samples. The Q-learning update at iteration $i$ uses the following loss function:

$$ L_i(\theta_i) = \mathbb{E}_{(s, a, r, s') \sim U(D)} \left[ \left(r + \gamma \max_{a'} Q(s', a'; \theta_i^-) - Q(s, a; \theta_i)\right)^2 \right] $$

What is experience replay, and what are its benefits, in laymen's terms?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange