Question

Let's say that I have a value that I've measured every day for the past 90 days. I would like to plot a histogram of the values, but I want to make it easy for the viewer to see where the measurements have accumulated over certain non-overlapping subsets of the past 90 days. I want to do this by "subdividing" each bar of the histogram into chunks. One chunk for the earliest observations, one for more recent, one for the most recent.

This sounds like a job for df.plot(kind='bar', stacked=True) but I'm having trouble getting the details right.

Here's what I have so far:

import numpy as np
import pandas as pd
import seaborn as sbn

np.random.seed(0)

data = pd.DataFrame({'values': np.random.randn(90)})
data['bin'] = pd.cut(data['values'], 15, labels=False)
forhist = pd.DataFrame({'first70': data[:70].groupby('bin').count()['bin'],
                         'next15': data[70:85].groupby('bin').count()['bin'],
                         'last5': data[85:].groupby('bin').count()['bin']})

forhist.plot(kind='bar', stacked=True)

And that gives me:

poor result

This graph has some shortcomings:

  • The bars are stacked in the wrong order. last5 should be on top and next15 in the middle. I.e. they should be stacked in the order of the columns in forhist.
  • There is horizontal space between the bars
  • The x-axis is labeled with integers rather than something indicative of the values the bins represent. My "first choice" would be to have the x-axis labelled exactly as it would be if I just ran data['values'].hist(). My "second choice" would be to have the x-axis labelled with the "bin names" that I would get if I did pd.cut(data['values'], 15). In my code, I used labels=False because if I didn't do that, it would have used the bin edge labels (as strings) as the bar labels, and it would have put these in alphabetical order, making the graph basically useless.

What's the best way to approach this? I feel like I'm using very clumsy functions so far.

Was it helpful?

Solution

Ok, here's one way to attack it, using features from the matplotlib hist function itself:

fig, ax = plt.subplots(1, 1, figsize=(9, 5))
ax.hist([data.ix[low:high, 'values'] for low, high in [(0, 70), (70, 85), (85, 90)]],
         bins=15,
         stacked=True,
         rwidth=1.0,
         label=['first70', 'next15', 'last5'])
ax.legend()

Which gives:

better

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top