The reason this is confusing is because you're squishing four histograms on one plot. In order to do this, matplotlib chooses to narrow the bars and put a gap between them. In a standard histogram, the total area of all bins is either 1
if normed
or N
. Here's a simple example:
a = np.random.rand(10)
bins = np.array([0, 0.5, 1.0]) # just two bins
plt.hist(a, bins, normed=True)
First note that the each bar covers the entire range of its bin: The first bar ranges from 0
to 0.5
, and its height is given by the number of points in that range.
Next, you can see that the total area of the two bars is 1
because normed = True
: The width of each bar is 0.5
and the heights are 1.2
and 0.8
.
Let's plot the same thing again with another distribution so you can see the effect:
b = np.random.rand(10)
plt.hist([a, b], bins, normed=True)
Recall that the blue bars represent exactly the same data as in the first plot, but they're less than half the width now because they must make room for the green bars. You can see that now two bars plus some whitespace covers the range of each bin. So we must pretend that the width of each bar is actually the width of all bars plus the width of the whitespace gap when we are calculating the bin range and bar area.
Finally, notice that nowhere do the xticks align with the binedges. If you wish, you can set this to be the case manually, with:
plt.xticks(bins)
If you hadn't manually created bins
first, you can grab it from plt.hist
:
counts, bins, bars = plt.hist(...)
plt.xticks(bins)