Creating a Python Histogram without Pylab [closed]

Question 1

Let's assume you have a numpy array that represents your random numbers

        rnd_numb=array([ 0.48942231,  0.48536864,  0.48614467, ...,  0.47264172,
    0.48309697,  0.48439782])

In order to create a histogram you only need to bin your data. So let's create an array that defines the binning

       bin_array=linspace(0,1,100)

In this case we're creating 100 linearly spaced bins in the range 0 to 1

Now, in order to create the histogram you can simply do

  my_histogram=[]
  for i in range(len(bin_array)-1):
     mask = (rnd_numb>=bin_array[i])&(rnd_numb<bin_array[i+1])
     my_histogram.append(len(rnd_numb[mask]))

This creates a list that contains the counts in each bin. Lastly, if you want to visualize your histogram you can do

 plot ((bin_array[1:]+bin_array[:-1])/2.,my_histrogram)

you can also try step or bar.

Question 2

A fast way to compute a histogram is to walk through the list one element at a time, sort out what bin it should be in and then count the number of entries in each bin.

hist_vals = np.zeros(nbins)
for d in data:
    bin_number = int(nbins * ((d - min_val) / (max_val - min_val)))
    hist_vals[bin_number] += 1

Note that this has O(len(data)) with a small pre-factor.

A smarter way to write this is to vectorize the hash function:

bin_number = (nbins * ((data - min_val) / (max_val - min_val))).astype(np.int)

and use slicing magic for the summation:

hist_vals[bin_number] += 1  # numpy slicing magic

If you are concerned about speed, you can use the numpy functions which essentially do this, but put the loops at the c level:

bin_nums = np.digitize(data, bins) - 1
hist_vals = np.bincount(bin_nums)

Question 3

Here is a version that builds on @tacaswell's solution but that doesn't use numpy.

def histogram(data, nbins, min_val=None, max_val=None):
    hist_vals = [0]*(nbins+1)
    if min_val is None:
        min_val = min(data)
    if max_val is None:
        max_val = max(data)

    for d in data:
        bin_number = int(nbins * ((d - min_val) / (max_val - min_val)))
        hist_vals[bin_number] += 1
    bin_lower_bounds = [min_val + i*(max_val - min_val)/len(hist_vals) for i in range(len(hist_vals))]
    return hist_vals, bin_lower_bounds