Question

I am trying to build a simple evolution simulation of agents controlled by neural network. In the current version each agent has feed-forward neural net with one hidden layer. The environment contains fixed amount of food represented as a red dot. When an agent moves, he loses energy, and when he is near the food, he gains energy. Agent with 0 energy dies. the input of the neural net is the current angle of the agent and a vector to the closest food. Every time step, the angle of movement of each agent is changed by the output of its neural net. The aim of course is to see food-seeking behavior evolves after some time. However, nothing happens.

I don't know if the problem is the structure the neural net (too simple?) or the reproduction mechanism: to prevent population explosion, the initial population is about 20 agents, and as the population becomes close to 50, the reproduction chance approaches zero. When reproduction does occur, the parent is chosen by going over the list of agents from beginning to end, and checking for each agent whether or not a random number between 0 to 1 is less than the ratio between this agent's energy and the sum of the energy of all agents. If so, the searching is over and this agent becomes a parent, as we add to the environment a copy of this agent with some probability of mutations in one or more of the weights in his neural network.

Thanks in advance!

Was it helpful?

Solution

If the environment is benign enough (e.g it's easy enough to find food) then just moving randomly may be a perfectly viable strategy and reproductive success may be far more influenced by luck than anything else. Also consider unintended consequences: e.g if offspring is co-sited with its parent then both are immediately in competition with each other in the local area and this might be sufficiently disadvantageous to lead to the death of both in the longer term.

To test your system, introduce an individual with a "premade" neural network set up to steer the individual directly towards the nearest food (your model is such that such a thing exists and is reasobably easy to write down, right? If not, it's unreasonable to expect it to evolve!). Introduce that individual into your simulation amongst the dumb masses. If the individual doesn't quickly dominate, it suggests your simulation isn't set up to reinforce such behaviour. But if the individual enjoys reproductive success and it and its descendants take over, then your simulation is doing something right and you need to look elsewhere for the reason such behaviour isn't evolving.

Update in response to comment:

Seems to me this mixing of angles and vectors is dubious. Whether individuals can evolve towards the "move straight towards nearest food" behaviour must rather depend on how well an atan function can be approximated by your network (I'm sceptical). Again, this suggests more testing:

  • set aside all the ecological simulation and just test perturbing a population of your style of random networks to see if they can evolve towards the expected function.
  • (simpler, better) Have the network output a vector (instead of an angle): the direction the individual should move in (of course this means having 2 output nodes instead of one). Obviously the "move straight towards food" strategy is then just a straight pass-through of the "direction towards food" vector components, and the interesting thing is then to see whether your random networks evolve towards this simple "identity function" (also should allow introduction of a readymade optimised individual as described above).

I'm dubious about the "fixed amount of food" too. (I assume you mean as soon as a red dot is consumed, another one is introduced). A more "realistic" model might be to introduce food at a constant rate, and not impose any artificial population limits: population limits are determined by the limitations of food supply. e.g If you introduce 100 units of food a minute and individuals need 1 unit of food per minute to survive, then your simulation should find it tends towards a long term average population of 100 individuals without any need for a clamp to avoid a "population explosion" (although boom-and-bust, feast-or-famine dynamics may actually emerge depending on the details).

OTHER TIPS

This sounds like a problem for Reinforcement Learning, there is a good online textbook too.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top