EDIT: this is the right answer after direct support was added somewhere between version 0.15-0.18
tl;dr: for recent pandas - use positions
argument to boxplot.
Adding a separate answer, which perhaps could be another question - feedback appreciated.
I wanted to add a custom column order within a groupby, which posed many problems for me. In the end, I had to avoid trying to use boxplot
from a groupby
object, and instead go through each subplot myself to provide explicit positions.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame()
df['GroupBy'] = ['g1', 'g2', 'g3', 'g4'] * 6
df['PlotBy'] = [chr(ord('A') + i) for i in xrange(24)]
df['SortBy'] = list(reversed(range(24)))
df['Data'] = [i * 10 for i in xrange(24)]
# Note that this has no effect on the boxplot
df = df.sort_values(['GroupBy', 'SortBy'])
for group, info in df.groupby('GroupBy'):
print 'Group: %r\n%s\n' % (group, info)
# With the below, cannot use
# - sort data beforehand (not preserved, can't access in groupby)
# - categorical (not all present in every chart)
# - positional (different lengths and sort orders per group)
# df.groupby('GroupBy').boxplot(layout=(1, 5), column=['Data'], by=['PlotBy'])
fig, axes = plt.subplots(1, df.GroupBy.nunique(), sharey=True)
for ax, (g, d) in zip(axes, df.groupby('GroupBy')):
d.boxplot(column=['Data'], by=['PlotBy'], ax=ax, positions=d.index.values)
plt.show()
Within my final code, it was even slightly more involved to determine positions because I had multiple data points for each sortby value, and I ended up having to do the below:
to_plot = data.sort_values([sort_col]).groupby(group_col)
for ax, (group, group_data) in zip(axes, to_plot):
# Use existing sorting
ordering = enumerate(group_data[sort_col].unique())
positions = [ind for val, ind in sorted((v, i) for (i, v) in ordering)]
ax = group_data.boxplot(column=[col], by=[plot_by], ax=ax, positions=positions)