Question

not sure what I'm doing wrong, but when I try and implement the polyfit to scatterplot data (year, rating) it keeps plotting a whole bunch of lines rather than one single line. It looks like this:

enter image description here

my code is below:

data = movies[['year', 'rtAllCriticsRating']]
data.year = data.year.astype(float).fillna(0.0)
data = data.convert_objects(convert_numeric=True)
data = data[data.rtAllCriticsRating > 0]
#print data
>>> 1995   5.4
    1950   2.3
    ....

#############issues start HERE########################
fig = plt.figure(figsize=(15, 15), dpi=100)
fig.add_subplot(212, axisbg='lightgrey')

# fit with np.polyfit
p = np.polyfit(data.year, data.rtAllCriticsRating, 3)
print p
plt.plot(data.year, data.rtAllCriticsRating, 'bo')
plt.plot(data.year,np.polyval(p, data.year),'r-') # A red solid line
plt.xlim(1900, 2020)
plt.ylim(0, 11)
plt.grid()
plt.xlabel('X Axis is by year')
plt.ylabel('Y Axis is by AllCriticRating')

what is going on, and how do I fix this? My main goal is to overlay on this scatter plot a line graph in red showing how the average movie rating (the average of rtAllCriticsRating across all movies in a year) has changed over time....

Was it helpful?

Solution

It looks like your data.year array is not in any particular order. When you put it into a scatter plot, that doesn't really matter. However, when you are using that array to overlay an average line, than you need it to be in numerical (in this case chronological) order. Try the following:

plt.plot(np.sort(data.year), np.polyval(p, np.sort(data.year), 'r-')

This should connect all of the lines in the appropriate order, forming one single curve.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top