If you have Scipy, you could call scipy.stats.binned_statistic:
import scipy.stats as stats
statistic, bin_edges, binnumber = stats.binned_statistic(
x=X, values=Y, statistic='median', bins=bins)
statistic = statistic[np.isfinite(statistic)]
print(statistic)
yields
[ 15. 90. 50. 55. 40. 60.]
Without SciPy, I think you would need a list comprehension.
As you suggested, you could avoid the RuntimeWarning by filtering out those bins which are empty. You can do that with an if-condition
inside a list comprehension:
masks = [(digitized == j) for j in range(1, len(bins))]
bin_medians = [np.median(Y[mask]) for mask in masks if mask.any()]
Also note that the error message you are seeing is a warning, not an Exception. You could (alternatively) suppress the error message with
import warnings
warnings.filterwarnings("ignore", 'Mean of empty slice.')
warnings.filterwarnings("ignore", 'invalid value encountered in double_scalar')
There is a way to compute the bin_centers more quickly:
bin_centers = []
for j in range(len(bins) - 1):
bin_centers.append((bins[j] + bins[j + 1]) / 2.)
could be simplified to
bin_centers = bins[:-1] + (bins[1]-bins[0])/2
So, for example,
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", 'Mean of empty slice.')
warnings.filterwarnings("ignore", 'invalid value encountered in double_scalar')
np.random.seed(123)
X = np.random.random(10)
bins = np.linspace(min(X), max(X), 10)
digitized = np.digitize(X, bins)-1
bin_centers = bins + (bins[1]-bins[0])/2
Y = range(0, 100, 10)
Y = np.asarray(Y, dtype='float')
bin_medians = [np.median(Y[digitized == j]) for j in range(len(bins))]
print(bin_medians)
plt.scatter(bin_centers, bin_medians)
plt.show()
yields
[15.0, 90.0, 50.0, 55.0, nan, 40.0, nan, nan, nan, 60.0]
If your purpose is only to make the scatter plot, then it is not necessary to remove the nans since matplotlib
will ignore them anyway.
If you really want to remove the nans, then you could use
no_nans = np.isfinite(bin_medians)
bin_medians = bin_medians[no_nans]
bin_centers = bin_centers[no_nans]
In the above, I opted for using warnings.filterwarnings
to just suppress the warnings. If you don't wish to suppress warnings, and would rather filter the nans from bin_medians
and from the corresponding locations from bin_centers
, then:
bin_centers = bins + (bins[1]-bins[0])/2
masks = [(digitized == j) for j in range(len(bins))]
bin_centers, bin_medians = zip(*[(center, np.median(Y[mask]))
for center, mask in zip(bin_centers, masks)
if mask.any()])