Question

Imagine that I have a set of measurements of x that are taken by many processes x0 ... xN at times t0 ... tN. Let's assume that at time t I want to make an estimate of the current value of x, based on the assumption that there is no long term trend I know about and that x can be predicted from an algorithm such as exponential smoothing. As we have many processes, and N can get very large, I can't store more than a few values (e.g. the previous state).

One approach here would be to adapt the normal exponential smoothing algorithm. If samples are taken regularly, I would maintain an estimator yn such that:

yn = α.yn-1 + ( 1 - α ). xn

This approach is not great where the sampling is irregular as many samples together would have a disproportionate influence. Therefore this formula could be adapted to:

yn = αn.yn-1 + ( 1 - αn ). xn

where

αn = e-k.(tn - tn-1)

IE dynamically adjusting the smoothing constant depending on the interval between the previous two samples. I'm happy with this approach and it seems to work. It's the first answer given here, and a good summary of these sorts of techniques are given by Eckner in this 2012 paper (PDF).

Now, my question is as follows. I want to adapt the above to estimate the rate of an occurrence. Occasionally an event will occur. Using a similar exponential technique, I want to get an estimate of the rate that event occurs.

Two obvious strategies would be:

  • To use the first or the second technique using the delay between the last two events as the data series xn.
  • To use the first or the second technique using the reciprocal of the delay between the last two events (i.e. the rate) as the data series xn.

Neither of these turn out to be good strategies as far as I can tell. Firstly, take an event that occurs every 500ms (on the one hand) and an event that occurs with a 200ms delay and an 800ms delay on the other. Clearly these both occur twice a second, so the rate estimate given should be the same. Ignoring the time from the last sample seems foolhardy, so I'll concentrate on the second strategy. Using the delay (rather than the reciprocal) does not turn out to be a good predictor because the simulating the 200ms/800ms sample stream produces an estimate of about 1.5 (on the basis the average of reciprocals is not the reciprocal of the average).

But, far more significantly, neither strategy copes with what is actually happening in practice, which is that suddenly all the events stop for a long while. The 'latest' value of y is thus the value at the last event, and the estimate of the rate thus never gets calculated. Hence the rate appears constant. Of course if I was analysing the data retrospectively, this wouldn't be an issue, but I'm analysing it in real time.

I realise another way to do this would be to run some thread periodically (e.g. every 10 seconds) and count the number of occurrences in this 10 second interval. This is very resource heavy my end as the statistics are not needed often, and I am loathe to run a thread that polls everything due to mutex issues. Therefore I'd like to (somehow) use an algorithm which adjusts the state read by (e.g.) the time since the last sample was taken. This seems a reasonable approach as if the performance is measured at times chosen independently of the samples, the measurement time will on average be half way through the period between samples, so a very crude unsmoothed estimate of the rate would be half the reciprocal of the time since the last sample. To complicate things further, my measurement time will not be independent of the samples.

I have a feeling this has a simple answer but it's eluding me. I have a feeling that the correct route is to assume the the events are Poisson distributed, and derive an estimate for λ based on the interval since the last sample and some form of moving average, but my statistics is too rusty to make this work.

There is a near dupe of this question here but the answer doesn't seem to be very satisfactory (I hope I explained why). I'd add that a Kalman filter seems way to heavyweight given I have one variable to estimate and know nothing about it. There are a number of other near dupes, most of which either suggest keeping large bins of values (not realistic here from a memory point of view) or do not address the two issues above.

Was it helpful?

Solution

First, if you assume that the occurrence rate of the events itself is constant (or that you're only interested in its long-term average), then you can simply estimate it as:

        λ* = N / (tt0)

where t is the current time, t0 is the start of observations, N is the number of events observed since t0 and λ* is the estimate of the true frequency λ.

At this point, it's useful to note that the estimation formula given above may be reformulated as the integral:

        λ* = integral( δevent(τ) dτ ) / integral( 1 dτ )

where the variable of integration τ ranges from t0 to t, and δevent(τ) = sum( δ(τ − ti), i = 1 .. N ) is a sum N of Dirac delta functions, with a single delta-peak at the occurrence time ti of each event i.

Of course, this would be a completely useless way to calculate λ*, but it turns out to be a conceptually useful formulation. Basically, the way to view this formula is that the function δevent(τ) measures the instantaneous rate at which the number of events increases at time τ, while the second integrand, which is just the constant 1, measures the rate at which time increases over time (which, of course, is simply one second per second).


OK, but what if the frequency λ itself may change over time, and you want to estimate its current value, or at least its average over a recent period?

Using the ratio-of-integrals formulation given above, we can obtain such an estimate simply by weighing both integrands by some weighing function w(τ) which is biased towards recent times:

        λ*recent = integral( δevent(τ) w(τ) dτ ) / integral( w(τ) dτ )

Now, all that remains is to pick a reasonable w(τ) such that these integrals simplify to something easy to calculate. As it turns out, if we choose an exponentially decaying weighing function of the form w(τ) = exp(k(τ − t)) for some decay rate k, the integrals simplify to:

        λ*recent = sum( exp(k(tit)), i = 0 .. N ) k / ( 1 − exp(k(t0t)) )

In the limit as t0 → −∞ (i.e., in practice, when the total observation time (tt0) is much larger than the weight decay timescale 1/k), this further simplifies to just:

        λ*recent = k sum( exp(k(tit)), i = 0 .. N )

Alas, naïvely applying this formula would still require us to remember all the event times ti. However, we can use the same trick as for calculating usual exponentially weighted averages — given the weighted average event rate λ*recent(t') at some earlier time t', and assuming that no new events have occurred between t' and t, we can calculate the current weighted average event rate λ*recent(t) simply as:

        λ*recent(t) = exp( k(t't) ) λ*recent(t')

Further, if we now observe a new event occurring at exactly time t, the weighted average event rate just after the event becomes:

        λ*recent(t) = k + exp( k(t't) ) λ*recent(t')


Thus, we get a very simple rule: all we need to store is the time tlast of the previous observed event, and the estimated recent event rate λ*last just after said event. (We may initialize these e.g. to tlast = t0 and λ*last = 0; in fact, with λ*last = 0, the value of tlast makes no difference, although for non-zero λ*last it does.)

Whenever a new event occurs (at time tnew), we update these values as:

        λ*lastk + exp( k(tlasttnew) ) λ*last
        tlasttnew

and whenever we wish to know the recent event rate average at the current time t, we simply calculate it as:

        λ*(t) = exp( k(tlastt) ) λ*last


Ps. To correct for the initial bias towards the (arbitrary) initial value of tlast, we can add back the 1 / ( 1 − exp(k(t0t)) ) correction term that we simplified out earlier when we assumed that tt0. To do that, simply start from tlast = 0 at t = t0, update tlast as above, but calculate the estimated recent event rate average at time t as:

        λ*corr(t) = exp( k(tlastt) ) λ*last / ( 1 − exp(k(t0t)) )

(Here, t0 denotes the time at which you start measuring events, not the occurrence of the first event.)

This will eliminate the initial bias towards zero, at the cost of increasing the early variance. Here's an example plot showing the effects of the correction, for k = 0.1 and a true mean event rate of 2:

Plot of λ* over time, with or without initial bias correction
The red line shows λ*(t) without the initial bias correction (starting from λ*(t0) = 0), while the green line shows the bias-corrected estimate λ*corr(t).


Pps. As the plot above shows, λ*, as calculated above, will not a be continuous function of time: it jumps up by k whenever an event occurs, and decays exponentially towards zero when events do not occur.

If you'd prefer a smoother estimate, you can calculate an exponentially decaying average of λ* itself:

        λ**(t) = integral( λ*(τ) exp(k2(τ − t)) dτ ) / integral( exp(k2(τ − t)) dτ )

where λ* is the exponentially decaying average event rate as calculated above, k2 is the decay rate for the second average, and the integrals are over −∞ < τ ≤ t.

This integral can also be calculated by a step-wise update rule as above:

        λ**lastWt) λ*last + exp( −k2 Δt ) λ**last
        λ*lastk1 + exp( −k1 Δt ) λ*last
        tlasttnew

where k1 and k2 are the decay rates for the first and second averages, Δt = tnewtlast is the elapsed time between the events, and:

        Wt) = k2 ( exp( −k2 Δt ) − exp( −k1 Δt ) ) / (k1k2)

if k1k2, or

        Wt) = k Δt exp( −k Δt )

if k1 = k2 = k (the latter expression arising from the former as the limit when (k1k2) → 0).

To calculate the second average for an arbitrary point in time t, use the same formula:

        λ**(t) = Wt) λ*last + exp( −k2 Δt ) λ**last

except with Δt = ttlast.


As above, this estimate can also be bias-corrected by applying a suitable time-dependent scaling factor:

        λ**corr(t) = λ**(t) / (1 - S(tt0))

where:

        St) = ( k1 exp( −k2 Δt ) − k2 exp( −k1 Δt ) ) / (k1k2)

if k1k2, or

        St) = (1 + k Δt) exp( −k Δt )

if k1 = k2 = k.

The plot below shows the effects of this smoothing. The red and green lines show λ*(t) and λ*corr(t) as above, while the yellow and blue lines show λ**(t) and λ**corr(t), as calculated with k1 = 0.1 (as above) and k2 = 0.2:

Plot of &lambda;* and &lambda;** over time, with or without initial bias correction

OTHER TIPS

You could try this:

Keep an estimator zn so that at each event:

zn = (zn-1+κ).e-κ.(tn-tn-1)

This will converge towards the event rate in s-1. A sligtly better estimator is then (as there is still an error/noise related if you compute the estimate just before or just after an event) :

wn = zn.e-κ/(2.zn)

In your example it will converge to 2s-1 (the inverse of 500ms)

The constant κ is responsible for the smoothing and is in s-1. Small values will smooth more. If your event rate is roughly of seconds, a value of 0.01s-1 for κ is a good start.

This method has a starting bias, and z0 could be set to an estimate of the value for faster convergence. Small values of κ will keep the bias longer.

There are much more powerful ways of analyzing poisson-like distributions, but they often require large buffers. Frequency analysis such as Fourier transform is one.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top