rv_continuous
is a base class for all of the probability distributions implemented in scipy.stats
. You would not call methods on rv_continuous
yourself.
Your question is not entirely clear about what you want to do, so I will assume that you have an array of 16383 data points drawn from some unknown probability distribution. From the raw data, you will need to estimate the cumulative distribution, find the values of that cumulative distribution at the sup
and inf
values and subtract to find the probability that a value drawn from the unknown distribution.
There are lots of ways to estimate the unknown distribution from the data depending on how much modelling you want to do and how many assumptions you want to make. At the more complicated end of the spectrum, you could try to fit one of the standard parametric probability distributions to the data. For example, if you had a suspicion that your data were lognormally distributed, you could use scipy.stats.lognorm.fit(data, floc=0)
to find the parameters of the lognormal distribution that fit your data. Then you could use scipy.stats.lognorm.cdf(sup, *params) - scipy.stats.lognorm.cdf(inf, *params)
to estimate the probability of the value being between those values.
In the middle are the non-parametric forms of distribution estimation like histograms and kernel density estimates. For example, scipy.stats.gaussian_kde(data).integrate_box_1d(inf, sup)
is an easy way to make this estimate using a Gaussian kernel density estimate of the unknown distribution. However, kernel density estimates aren't always appropriate and require some tweaking to get right.
The simplest thing you could do is just count the number of data points that fall between inf
and sup
and divide by the total number of data points that you have. This only works well with a largish number of points (which you have) and with bounds that aren't too far in the tails of the data.