Question

I´m trying to implement a method that fits a line to a set of points in 2D. I wrote the following code that reads the data from two Array (X, Y coordinate) and should calculate the parameters of the best fitting line with the least squares method. I used the formulas given here: mathworld.wolfram

- (void) linearRegressionOfUserAcceleration 
{
     double avgX = [[_accelBufferX valueForKeyPath:@"@avg.doubleValue"] doubleValue];
     double avgY = [[_accelBufferY valueForKeyPath:@"@avg.doubleValue"] doubleValue];
     int n = _accelBufferX.count;

     double ssX, ssY, ssXY;
     ssX = ssY = ssXY = 0;
     int i;

     // Sum of squares X, Y & X*Y
     for (i = 0; i < n; i++)
     {
         ssX += pow([[_accelBufferX objectAtIndex:i] doubleValue],2);
         ssY += pow([[_accelBufferY objectAtIndex:i] doubleValue],2);
         ssXY += [[_accelBufferX objectAtIndex:i] doubleValue] * [[_accelBufferY objectAtIndex:i] doubleValue];
     }

     ssX = ssX - n * pow(avgX,2);
     ssY = ssY - n * pow(avgY,2);
     ssXY = ssXY - n * avgX * avgY;

     // Best fit of line y_i = a + b * x_i 
     b = ssXY / ssX;
     a = avgY - b * avgX;

     // Correlationcoefficent gives the quality of the estimate: 1 = perfect to 0 = no fit
     corCoeff = pow(ssXY,2) / ssX * ssY;

     NSLog(@"n: %d, a: %f --- b: %f --- cor: %f --- avgX: %f --- avgY: %f --- ssX: %f - ssY: %f - ssXY: %f", n, a, b, corCoeff, avgX, avgY, ssX, ssY, ssXY);
}

I get outputs like this:

  n: 15, a: -0.095204 --- b: 0.929245 --- cor: 3.567163   --- avgX: -0.017827 -- avgY: -0.111770 --- ssX: 2.176048 - ssY: 1.898429 - ssXY: 2.022081

The resulting line does not fit the data at all and although the corelationCoefficient is sometimes bigger than one, which IMHO should never happen if everything works correctly.

Does anyone see any errors in my implementation?

- EDIT -

This is the corrected code, following the tip from CRD. I used this to extract the direction vector of the sampled userAcceleration in the horizontal plane between two steps, to get the step direction.

This worked for me:

- (void) linearRegressionOfUserAcceleration
{
    NSUInteger n = _accelBufferX.count;
    double ax, ay, sX, sY, ssX, ssY, ssXY, avgX, avgY;

    // Sum of squares X, Y & X*Y
    for (NSUInteger i = 0; i < n; i++)
    {
        @synchronized(self) {
            ax = [[_accelBufferX objectAtIndex:i] doubleValue];
            ay = [[_accelBufferY objectAtIndex:i] doubleValue];
        }
        sX += ax;
        sY += ay;
        ssX += ax * ax;
        ssY += ay * ay;
        ssXY += ax * ay;
    }

    avgX = sX / n;
    avgY = sY / n;
    radius = hypot(avgX, avgY);
    ssX = ssX - n * (avgX * avgX);
    ssY = ssY - n * (avgY * avgY);
    ssXY = ssXY - n * avgX * avgY;

    // Best fit of line y_i = a + b * x_i
    b = ssXY / ssX;
    a = (avgY - b * avgX);
    theta = atan2(1, b);


    // Correlationcoefficent gives the quality of the estimate: 1 = perfect to 0 = no fit
    corCoeff = (ssXY * ssXY) / (ssX * ssY);

    NSLog(@"n: %d, a: %f --- b: %f --- cor: %f   --- avgX: %f -- avgY: %f --- ssX: %f - ssY: %f - ssXY: %f", n, a, b, corCoeff, avgX, avgY, ssX, ssY, ssXY);
}
Was it helpful?

Solution

Put in some known data you can check by hand, e.g. {1,1}, {2,2}, {3,3}. Are the averages correct? If so move on to the sums, etc. The error will reveal itself.

On the code itself you can make it clearer, and incidentally more efficient, by dropping the calls to @"avg.doubleValue" and producing all your sums in a single loop:

// Sum of X, Y, X^2, Y^2 & X*Y
for (NSUInteger i = 0; i < n; i++)
{
   double x = [[_accelBufferX objectAtIndex:i] doubleValue];
   double y = [[_accelBufferY objectAtIndex:i] doubleValue];
   sX += x;
   sY += y;
   ssX += x * x;
   ssY += y * y;
   ssXY += x * y;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top