Pregunta

I am trying to train a RNN with text from wikipedia but I having having trouble getting the RNN to converge. I have tried increasing the batch size but it doesn't seem to be helping. All data is one hot encoded before being used and I am using the Adam optimizer which is implemented like this.

   for k in M.keys(): ##For k in weights
        M[k] = beta1 * M[k] + (1-beta1)*grad[k]
        R[k] = beta2 *R[k] + (1-beta2)*grad[k]**2
        m_k = M[k] / (1-beta1**n)
        r_k = R[k] / (1-beta2**n)
        model[k] = model[k] - alpha * m_k / np.sqrt(r_k + 1e-8)

Beta1 is set to 0.9, beta2 to 0.999 and alpha is set to 0.001. When I train it for 50,000 I get very high fluctuation of the cost and it never seems to significantly decrease (only sometimes due to the fluctuations (and I catch the weights with the lowest cost)). After sketching the cost of iterations I get a graph like this:enter image description here

It seems to be increasing on average only seeming to decrease to the the large fluctuations. What can I change to have better success and have it converge?

Thanks for any help

No hay solución correcta

Licenciado bajo: CC-BY-SA con atribución
scroll top