Domanda

OK, so you have some historic data in the form of [say] an array of integers. This, for example, could represent free-space on a server HDD over a two-year period, with each array element representing a daily sample.

The data (free-space in this example) has a downward trend, but also has periodic positive spikes where files have been removed/compressed, Etc.

How would you go about identifying the overall trend for the two-year period, i.e.: iron out the peaks and troughs in the data?

Now, I did A-level statistics and then a stats module in my degree, but I've slept over 7,000 times since then, and well, it's leaked out of my brain.

I'm not after a bit of code as such, more of a description of how you'd approach this problem...

Thanks in advance!

È stato utile?

Soluzione 2

If I was doing this to produce a line through points for me to look at, I would probably use a some variant of Loess, described at http://en.wikipedia.org/wiki/Local_regression, http://stat.ethz.ch/R-manual and /R-patched/library/stats/html/loess.html. Basically, you find the smoothed value at any particular point by doing a weighted regression on the data points near that point, with the nearest points given the most weight.

Altri suggerimenti

You'll get many different answers, and the one you choose really depends on more specific requirements you may have. Examples:

  1. Low-pass filter, or any other spectral analysis technique, and use the low frequencies to determine trend.

  2. Linear regression (time/value) to find "r" (the correlation between time and the value).

  3. Moving average of last "n" samples. If "n" is large enough this is my favorite as many times this is sufficient, and is very easy to code. It's a sort of approximation to #1 above.

I'm sure they'll be others.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top