Question

I am currently trying to create a tic tac toe q learning neural network to introduce me to reinforcement learning, however it didn't work so I decided to try a simpler project requiring a network to train against static data rather than another neural network. This lead to me following the guidelines from this website - http://outlace.com/rlpart3.html

however after programming this, the simple version works half the time, this is the version without experience replay. on some runs of the program the game will be played correctly, others it just moves back and forth wehn doing test runs. When trying to implement experience replay to complete the harder version. the program will just constantly get itself into a loop of going back and forth when testing

i have a limit of 100 batches in which a batch is what the neural network is trained on. I am wondering whether this is an appropriate amount, or if there could be any common problems with implementing experience replay that i may have made.

My current perspective of what experience replay is: 1. run the program 2. after each turn, the data you used to train the network on, gets saved into a batch 3. when you have reaches x(100) amount of batches, pick one out and train on it. 4. overwrite the oldest batch with the new batches that come in.

If anyone could let me know where I have gone wrong, or if there is any feedback about the experience replay or the quality of the question please let me know and i would be very grateful.

EDIT: Another question I have in terms of training a neural network against a neural network, is that do you train it against a completely separate network that trains itself,m or do you train it against a previous version of itself. And when training it against the other neural network, do you turn the epsilon greedy down to make the opposing neural network not use any random moves.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top