Вопрос

Is there a java library that allows me to update rather than recompute quantiles of a large sample set of data with addition/removal of data points ? My guess is that an efficient algorithms should take a constant time for the update (not a function of the number of points already existing).

Known algorithms are listed but dont' have a way of removing points from the sample set :

Here is a sample problem: Say I want to compute say an arbitrary but constant percentile fan speed of a set of windmills (as an estimate of the wind speed). The fan speeds are updated asynchronously every few milliseconds. This library should allow me to update the wind speeds of one windmill at a time without having to recalculate the median.

Это было полезно?

Решение

If you maintain an updatable sorted representation of the data, getting the quantiles is easy and efficient just by using the length of your array. For example, if you have N elements then the median will be at position N/2, and so on. When you insert a new element into your data structure, this will still hold. The efficiency is then just dependent on inserting a new element.

Другие советы

You could have multiple batches of data. You can combine the percentiles/quartiles of these batches to estimate an aggregate. The benefit is that you can discard a number of batches efficiently without having to re-calculate the other batches.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top