Binning python tuples. Error due to empty bins

Question 1

If you have Scipy, you could call scipy.stats.binned_statistic:

import scipy.stats as stats
statistic, bin_edges, binnumber = stats.binned_statistic(
    x=X, values=Y, statistic='median', bins=bins)
statistic = statistic[np.isfinite(statistic)]
print(statistic)

yields

[ 15.  90.  50.  55.  40.  60.]

Without SciPy, I think you would need a list comprehension. As you suggested, you could avoid the RuntimeWarning by filtering out those bins which are empty. You can do that with an if-condition inside a list comprehension:

masks = [(digitized == j) for j in range(1, len(bins))]
bin_medians = [np.median(Y[mask]) for mask in masks if mask.any()]

Also note that the error message you are seeing is a warning, not an Exception. You could (alternatively) suppress the error message with

import warnings
warnings.filterwarnings("ignore", 'Mean of empty slice.')
warnings.filterwarnings("ignore", 'invalid value encountered in double_scalar')

There is a way to compute the bin_centers more quickly:

bin_centers = []
for j in range(len(bins) - 1):
    bin_centers.append((bins[j] + bins[j + 1]) / 2.)

could be simplified to

bin_centers = bins[:-1] + (bins[1]-bins[0])/2

So, for example,

import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", 'Mean of empty slice.')
warnings.filterwarnings("ignore", 'invalid value encountered in double_scalar')

np.random.seed(123)

X = np.random.random(10)
bins = np.linspace(min(X), max(X), 10)
digitized = np.digitize(X, bins)-1
bin_centers = bins + (bins[1]-bins[0])/2

Y = range(0, 100, 10)
Y = np.asarray(Y, dtype='float')
bin_medians = [np.median(Y[digitized == j]) for j in range(len(bins))]
print(bin_medians)

plt.scatter(bin_centers, bin_medians)
plt.show()

yields

[15.0, 90.0, 50.0, 55.0, nan, 40.0, nan, nan, nan, 60.0]

enter image description here

If your purpose is only to make the scatter plot, then it is not necessary to remove the nans since matplotlib will ignore them anyway.

If you really want to remove the nans, then you could use

no_nans = np.isfinite(bin_medians)
bin_medians = bin_medians[no_nans]
bin_centers = bin_centers[no_nans]

In the above, I opted for using warnings.filterwarnings to just suppress the warnings. If you don't wish to suppress warnings, and would rather filter the nans from bin_medians and from the corresponding locations from bin_centers, then:

bin_centers = bins + (bins[1]-bins[0])/2
masks = [(digitized == j) for j in range(len(bins))]
bin_centers, bin_medians = zip(*[(center, np.median(Y[mask]))
                                 for center, mask in zip(bin_centers, masks)
                                 if mask.any()])

Question 2

I don't quite understand the question, but here's something to maybe get you started:

In [3]: X = [1,2,3,4,5,6,7,8,9,10]

In [4]: Y = [chr(96+x) for x in X]

In [8]: Z = zip(X, Y)    # Create a pairing - this can be done after a sort if they're not in whatever 'order' you want for your correspondence

In [9]: Z
Out[9]:
[(1, 'a'),
 (2, 'b'),
 (3, 'c'),
 (4, 'd'),
 (5, 'e'),
 (6, 'f'),
 (7, 'g'),
 (8, 'h'),
 (9, 'i'),
 (10, 'j')]

At this point you can do something like sorted(Z, key=lambda el: -ord(el[1])) or whatever to sort based on your criteria. Obviously it'd be more meaningful than the example.

Finally, to chunk into equal-length parts, which I think you might also want, take a look at the wide variety of possibilities given as answers here.

If that's not what you were looking for, apologies.