Polynomial regression in octave - too small coefficients

https://stackoverflow.com/questions/23597034

20-07-2023
|

Question

I wanted to understand how regression works by implementing it in octave. To check if my function is correct I compare its result with builtin function polyfit. Here is my code:

    p = [0, 1.5 * pi];

    function y = f(x)
        y = (exp(-4 * sin(4*x)));
    end

    function c = regress1(x, y, degree)
        L = @(j)(arrayfun( @(k)(j^k), (0:degree)));

        x_data = [];
        for i = x
            x_data = [x_data; L(i)];
        end

        mx = x_data;
        my = y';

        c = fliplr((pinv(mx' * mx) * mx' * my)');
    end

    % number of known points
    N = 50;
    % polynomial degree
    D = 20;

    % known points
    X = linspace(p(1), p(2), N);
    Y = arrayfun(@f, X);

    % my function
    R = regress1(X, Y, D);
    % reference function
    C = polyfit(X, Y, D);

    % test points
    W = linspace(p(1), p(2), 1000);
    V1 = arrayfun(@(x)(polyval(R,x)), W);
    Z = arrayfun(@(x)(polyval(C,x)), W);


    clf;
    hold on;
    fplot(@f, p);
    plot(X, Y, '*');
    plot(W, V1, '1-');
    plot(W, Z, '2-');
    hold off;

The problem is that it works for small degree (variable D in my program) but for bigger degrees it producecs far too small coefficients. Plot of orginal function and two approximations: orginal - blue, my - red, polyfit - green

I suspect that something in this part (pinv(mx' * mx) * mx' * my)' may be wrong. Also, if i change from pinv to inv I get different result, but even worse.

I did some research, found some other solutions e.g , but even though those programs are similar, I can't find bugs in mine.

Could someone explain me what is wrong, please?

Solution

Your matrix becomes ill-conditioned for large D (degree of the approximation polynomial). You can try to use different kind of polynomial base to do the regression. For e.g. used Chebyshev polynomials instead of x, x^2, x^3, ... , x^D.

You can obtain a slightly better result if you center and scale your data. Basically if you try to fit a function to a set (x, y), you can redefine the problem using:

xn = (x - mean(x))/std(x)

after that fit the set on (xn, y).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow