I can reproduce with your data:
>>> np.__version__
1.8.0
>>> fp100 = sp.polyfit(x, y, 100)
polynomial.py:587: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
>>> f100 = sp.poly1d(fp100)
>>> f100.order
53
Note warning and consult the docs:
polyfit issues a RankWarning when the least-squares fit is badly conditioned. This implies that the best fit is not well-defined due to numerical error. The results may be improved by lowering the polynomial degree or by replacing x by x - x.mean()
Your y
has low variance:
>>> y.mean()
1961.7438692098092
>>> y.std()
860.64491521872196
So one won't expect higher polinomial to fit it well. Note that after replacing as proposed by docs, x with x-x.mean()
, it is approximated by polinomial of lower degree not worse than with higher:
>>> xp=x-x.mean()
>>> f100 = sp.poly1d(sp.polyfit(xp, y,100))
>>> max(abs(f100(xp)-y)/y)
2.1173504721727299
>>> abs((f100(xp)-y)/y).mean()
0.18100985148093593
>>> f4 = sp.poly1d(sp.polyfit(xp, y, 4))
>>> max(abs(f4(xp)-y)/y)
2.1228866902203842
>>> abs((f4(xp)-y)/y).mean()
0.20139219654066282
>>> print f4
4 3 2
8.827e-08 x + 3.161e-05 x + 0.0003102 x + 0.06247 x + 1621
In fact, most significant component seems to have degree 2. So it's normal, that best approximating your data polinomial of degree not greater than 100, in fact has degree 53. All higher monomials are degenerate. Below is picture representing approximation, red line corresponds to polinomial of degree 4, green to one with degree 53: