scipy.polyfit(x, y, 100) would be 100th order polynome, but matplotlib.pyplot.legend displays 53?

StackOverflow https://stackoverflow.com/questions/20838970

Frage

I'm having a hard time figuring out why my plt.legend displays the wrong polynome degree. It says 53 instead of 100. My code would go like this:

import scipy as sp
import numpy as np
import urllib2
import matplotlib.pyplot as plt

url = 'https://raw.github.com/luispedro/BuildingMachineLearningSystemsWithPython/master/ch01/data/web_traffic.tsv'
src = urllib2.urlopen(url)
data = np.genfromtxt(src)

x = data[:, 0]
y = data[:, 1]
x = x[~sp.isnan(y)] 
y = y[~sp.isnan(y)] 

def error(f, a, b):
    return sp.sum((f(a) - b) ** 2)

fp100 = sp.polyfit(x, y, 100)
f100 = sp.poly1d(fp100)
plt.plot(x, f100(x), linewidth=4)
plt.legend("d={num}".format(num=f100.order), loc=2)
plt.show()
War es hilfreich?

Lösung

I can reproduce with your data:

>>> np.__version__
1.8.0
>>> fp100 = sp.polyfit(x, y, 100)
polynomial.py:587: RankWarning: Polyfit may be poorly conditioned
  warnings.warn(msg, RankWarning)
>>> f100 = sp.poly1d(fp100)
>>> f100.order
53

Note warning and consult the docs:

polyfit issues a RankWarning when the least-squares fit is badly conditioned. This implies that the best fit is not well-defined due to numerical error. The results may be improved by lowering the polynomial degree or by replacing x by x - x.mean()

Your y has low variance:

>>> y.mean()
1961.7438692098092
>>> y.std()
860.64491521872196

So one won't expect higher polinomial to fit it well. Note that after replacing as proposed by docs, x with x-x.mean(), it is approximated by polinomial of lower degree not worse than with higher:

>>> xp=x-x.mean()
>>> f100 = sp.poly1d(sp.polyfit(xp, y,100))
>>> max(abs(f100(xp)-y)/y)
2.1173504721727299
>>> abs((f100(xp)-y)/y).mean()
0.18100985148093593

>>> f4 = sp.poly1d(sp.polyfit(xp, y, 4))
>>> max(abs(f4(xp)-y)/y)
2.1228866902203842
>>> abs((f4(xp)-y)/y).mean()
0.20139219654066282

>>> print f4
           4             3             2
8.827e-08 x + 3.161e-05 x + 0.0003102 x + 0.06247 x + 1621

In fact, most significant component seems to have degree 2. So it's normal, that best approximating your data polinomial of degree not greater than 100, in fact has degree 53. All higher monomials are degenerate. Below is picture representing approximation, red line corresponds to polinomial of degree 4, green to one with degree 53:

plotted data and approximation

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top