I have practically completed one, but wanted to compare mine with a well researched an possibly academic algorithm. There may be a library of statistical objects which either directly or in-combination solve my particular need.
My system (which I intend to OpenSource) has a stream of NetFlow data. Rather than store in database and using SQL functions, I prefer to have a database-free system and maintain a set of statistics, updated for each new flow, and scrolled per-second (or higher).
My solution involves an single array of uint, to effectively create a jagged array of sizes [60, 59, 23, 6, ...], representing seconds, minutes, hours, days, weeks, etc....
Each slot contains the total amount of Bytes for that time. So after 60 seconds a single minute statistic is created as Avg(seconds). This of course continues relatively up the time scale.
Rather than simply having thousands of second increments, it is due to:
- Memory constraints and the potential to have more statistical nodes; AND
- Ideal presentation to users
...that I roll up time scales.
Given that a flow may be applied to several nodes in a heirarchy of statistics (WAN Link, IP Address, Destination Address, SourcePort-DestinationPort), I calculate the delta once (GenerateDelta) and then simply apply at every node which is both active and which matches the flow meta-data.
A statistic on a given node would be "scrolled", in the following potential cases:
- When being read/displayed (via HTTP\JSON AJAX Request)
- When a delta is being applied (due to relevant flow)
- Simply every n-seconds (n is typically 1)
Overall there may be a well established algorithm for keeping running totals over time (with seconds, minutes...). But failing that, there may also be a suitable algorithms for comparison on smaller sub-sections of my code:
- GenerateDelta - not likely as this is specific for breaking down and averaging a flow with duration over slots in the statistics array.
- Scroll - if there were only seconds, then this would of course be simple, however my solution requires the 60 seconds to be combined into a new minute total every 60 seconds, and so on.
I do not wish responders to suggest any of their own algorithms, I have already (almost) completed all of my own without any problems and with many performance considerations. And others will likely be able to have a look at my algorithm when I have finished and published as Open Source.
What I do wish to see is any "well established" algorithms for comparison. Perhaps mine will be better, perhaps mine will be worse. Google isn't good at this sort of question, I need your help.
Thanks!