Question

I am currently working on famous Mountain Car problem from reinforcement learning. This problem is of continuous nature, meaning I have two variables: one position - ranging from -1.2 to 0.5 and velocity - ranging from -0.07 to 0.07. And I have 3 possible actions - reverse acceleration, forward acceleration and neutral, actions result in changing position in appropriate direction. Because of how acceleration is calculated my position variable is continuous, meaning that I can't use a lookup table, so I tried to divide position-velocity axis in rectangular sectors, dividing position into buckets of width 0.05 and velocity into buckets of 0.005 length, assigning each sector an index, I did it like this:

public int discretiseObservation(Observation observation) {
    double position = observation.getDouble(0) ;
    double velocity = observation.getDouble(1);

    boolean positionNegativeFlag = position < 0;
    boolean velocityNegativeFlag = velocity < 0;

    double absolutePosition = Math.abs(position);
    double absoluteVelocity = Math.abs(velocity);

    double discretePosition = Math.floor(absolutePosition / 0.05);
    double discreteVelocity = Math.floor(absoluteVelocity / 0.005);

    if(velocityNegativeFlag) {
        discreteVelocity += 14;
    }

    if(positionNegativeFlag) {
        discretePosition += 10;
    }

    return (int)discretePosition * 28 + (int)discreteVelocity;
}

But this scheme results in some sectors having the same index number. Do you have any idea how can I discretize this two continuous variables?

Upd: Sorry forgot to mention that when position or velocity exceeds maximum or minimum value I set it back to maximum or minimum value

Was it helpful?

Solution

You are overly complicating things a bit with all those sign checks. Also, you should avoid using magic constants — give them meaningful names. The discretization code should look like this:

double normalize(double value, double min, double max) {
    return (value - min) / (max - min);
}

int clamp(int value, int min, int max) {
    if (value < min) value = min;
    if (value > max) value = max;
    return value;
}

int discretize(double value, double min, double max, int binCount) {
    int discreteValue = (int) (binCount * normalize(value, min, max));
    return clamp(discreteValue, 0, binCount - 1);
}

public int discretizeObservation(Observation observation ) {
    int position = discretize(observation.getDouble(0), minPosition, maxPosition, positionBinCount);
    int velocity = discretize(observation.getDouble(1), minVelocity, maxVelocity, velocityBinCount);
    return position * velocityBinCount + velocity;
}

OTHER TIPS

You're not limiting your position and velocity. When they are too large (no matter what sign), they'll overflow the hardcoded offset values (14 and 10). You must limit the values before you combine them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top