Question

Ok so I have a dataframe which contains timeseries data that has a multiline index for each columns. Here is a sample of what the data looks like and it is in csv format. Loading the data is not an issue here.

enter image description here

What I want to do is to be able to create a boxplot with this data grouped according to different catagories in a specific line of the multiindex. For example if I were to group by 'SPECIES' I would have the groups, 'aq', 'gr', 'mix', 'sed' and a box for each group at a specific time in the timeseries.

I've tried this:

grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES')
grouped.boxplot()

but it gives me a boxplot (flat line) for each point in the group rather than for the grouped set. Is there an easy way to do this? I don't have any problems grouping as I can aggregate the groups any which way I want, but I can't get them to boxplot.

Was it helpful?

Solution 2

I think I figured it out, maybe this will be helpful to someone:

grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES').T
grouped.boxplot()

Basically groupby output needed to be transposed so that the boxplot showed the right grouping:

enter image description here

OTHER TIPS

this code:

data['2013-08-17'].boxplot(by='SPECIES')

Will not work, as boxplot is a function for a DataFrame and not a Series.

While in Pandas > 0.18.1 the boxplot function has the argument columns which defines from what column the data is taken from.

So

data.boxplot(column='2013-08-17',by='SPECIES')

should return the desired result.

An example with the Iris dataset:

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/io/data/csv/iris.csv')
fig, ax = plt.subplots(figsize=(10,8))
plt.suptitle('')
data.boxplot(column=['SepalLength'], by='Name', ax=ax)

creates:

Boxplot iris dataset with pandas

plt.suptitle('') 

turns off the annoying automatic subtitle. And of course the column arguments accepts lists of columns... so

data.boxplot(column=['SepalLength', 'SepalWidth'], by='Name', ax=ax)

also works.

This should work in version 0.16:

data['2013-08-17'].boxplot(by='SPECIES')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top