Question

I am thinking to implement a learning strategy for different types of agents in my model. To be honest, I still do not know what kind of questions should I ask first or where to start.

I have two types of agents which I want them to learn by experience, they have a pool of actions which each has different reward based on specific situations that might happen. I am new to reinforcement Learning methods, therefore any suggestions on what kind of questions should I ask myself is welcomed :)

Here is how I am going forward to formulate my problem:

  1. Agents have a lifetime and they keep track of a few things that matter for them and these indicators are different for different agents, for example, one agent wants to increase A another wants B more than A.
  2. States are points in an agent's lifetime which they Have more than one option (I do not have a clear definition for States as they might happen a few times or not happen at all because Agents move around and they might never face a situation)
  3. The reward is the an increase or decrease in an indicator that agents can get from an action in a specific State, and agent do not know what would be the gain if he chose another action.
  4. The gain is not constant, the states are not well defined and there is no formal transition of one state into another,
  5. For example agent can decide to share with one of the co-located agent (Action 1) or with all of the agents at the same location(Action 2) If certain conditions hold true Action A will be more rewarding for that agent, while in other conditions Action 2 will have higher reward; my problem is I did not see any example with unknown rewards since sharing in this scenario also depends on the other agent's characteristics (which affects the conditions of reward system) and in different states it will be different.

In my model there is no relationship between the action and the following state,and that makes me wonder if its ok to think about RL in this situation at all.

What I am looking to optimize here is the ability for my agents to reason about current situation in a better way and not only respond to their need which is triggered by their internal states. They have a few personalities which can define their long term goal and can affect their decision making in different situations, but I want them to remember what action in a situation helped them to increase their preferred long term goal.

Was it helpful?

Solution

In my model there is no relationship between the action and the following state,and that makes me wonder if its ok to think about RL in this situation at all.

This seems strange. What do actions do if not change state? Note that agents don't have to necessarily know how their actions will change their state. Similarly, actions could change the state imperfectly (a robots treads could skid out so it doesn't actually move when it tries to). In fact, some algorithms are specifically designed for this uncertainty.

In any case, even if the agents are moving around the state space without having any control, it can still learn the rewards for the different states. Indeed, many RL algorithms involve moving around the state space semi-randomly to figure out what the rewards are.

I do not have a clear definition for States as they might happen a few times or not happen at all because Agents move around and they might never face a situation

You might consider expanding what goes into what you consider to be a "state". For instance, the position seems like it should definitely go into the variables identifying a state. Not all states need to have rewards (although good RL algorithms typically infer a measure of goodness of neutral states).

I would recommend clearly defining the variables that determine an agent's state. For instance, the state space could be current-patch X internal-variable-value X other-agents-present. In the simplest case, the agent can observe all of the variables that make up their state. However, there are algorithms that don't require this. An agent should always be in a state, even if the state has no reward value.

Now, concerning unknown reward. That's actually totally okay. Reward can be a random variable. In that case, a simple way to apply standard RL algorithms would be to use the expected value of the variable when making decisions. If the distribution is unknown, then the algorithm could just use the mean of the rewards observed so far.

Alternatively, you could include the variables that determine the reward in the definition of the state. That way, if the reward changes, then it is literally in a different state. For example, suppose a robot is on top of a building. It needs to get to the top of the building in front of it. If it just moves forward, it falls to ground. Thus, that state has a very low reward. However, if it first places a plank that goes from one building to the other, and then moves forward, the reward changes. To represent this, we could include plank-in-place as a variable so that putting the board in place actually changes the robot's current state and the state that would result from moving forward. Thus, the reward itself has not changed; it's just in a different state.

Hopefully this helps!

UPDATE 2/7/2018: A recent upvote reminded me of the existence of this question. In the years since it was asked, I've actually dived into RL in NetLogo to a much greater extent. In particular, I've made a python extension for NetLogo, primarily to make it easier to integrate machine learning algorithms in with model. One of the demos of the extension trains a collection of agents using deep Q-learning as the model runs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top