Question

I cannot figure out how to use the "names" in matplotlib when plotting data returned by numpy.genfromtxt command. Scenario: 1. I have a file with columns headers and rows of values 2. I don't know the column headers beforehand--they are generated programmatically and may change during the program run 3. I need to read the data AND the column headers, plot them and produce a corresponding legend.

I can read the data columns with their names with:

dataArray = numpy.genfromtxt('myData.csv', delimiter = ',', names = True)

and then plot them with

matplotlib.plot.plot(dataArray)
matplotlib.plot.show()

but how do I produce a suitable legend? I thought the legend command with no parameters would suffice (e.g. matplotlib.plot.legend()) but that is not the case. I get an error instead:

/usr/lib/python2.7/site-packages/matplotlib/axes.py:4601: UserWarning: No labeled objects found. Use label='...' kwarg on individual plots. warnings.warn("No labeled objects found. "

In other words:where do those "names" go and how can I retrieve them? Multiple searches on google, matplotlib site, and numy site produced no results.

Was it helpful?

Solution

You have to provide a label=.. keyword in the plot function for each line you want to plot, as matplotlib does not automatically detect names from a numpy structured array (you can also use pandas, which does this, see below).

Say for example you data look like this:

from StringIO import StringIO

myDatacsv = StringIO("""a, b, c
1, 2, 3
2, 3, 4
3, 4, 5""")

Reading them with numpy.genfromtxt produces a structured array:

>>> import numpy as np
>>> dataArray = np.genfromtxt(myDatacsv, delimiter = ',', names = True)
>>> dataArray
array([(1.0, 2.0, 3.0), (2.0, 3.0, 4.0), (3.0, 4.0, 5.0)], 
      dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])

(In your case there will be "myData.csv" instead of myDatacsv off course, this is just to make the example)
Now you can loop over the column names and plot each of them:

import matplotlib.pyplot as plt

plt.figure()
for col_name in dataArray.dtype.names:
    plt.plot(dataArray[col_name], label=col_name)

plt.legend()
plt.show()

This will generate a figure like this:

enter image description here

With pandas, this will produce the same figure (automatically plotting all columns of the dataframe and adding it to a legend):

import pandas as pd

# one of the following will do (reading it with pandas, or converting 
# from the numpy array to pandas dataframe)
data_df = pd.read_csv(myDatacsv)
data_df = pd.DataFrame(dataArray)

data_df.plot()

For more information about pandas, see: http://pandas.pydata.org/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top