Pregunta

I have lots of netflow data (i.e src_ip, dest_ip, beg_time, end_time, data_size, etc) and some of them are happening periodically that I want to find out.

Consider I have n netflow(maybe around 10^6) and m of them are periodic. How could I find which ones are periodic?

I can write a code but it will be at least O(n^3 logn), which will take forever for after 10^4 number of netflow.

I have searched about it but couldn't find anything.

Note: You can consider data is sorted according to start time and start time is 32 bit unsigned int(uint32 in c++)

Correction: src_ip is unique and dest_ip is not unique, time for periodicity is unknown. It may be 5 min or it may be 5 days. You can forget about src_ip, dest_ip, end_time, data_size and other attributes of flow. I'm only looking for events whose beginning times are periodic and you can consider, I have eleminated events which are unrelated like different src_ip's, and so on...

Any help will be appreciated,

Thanks

¿Fue útil?

Solución

I'd try computing FFT on signals corresponding to your data.

For example, I'd transform the chunk beg_time=1, end_time=5, data_size=100 into a square pulse from 1 to 5 units of time with the amplitude 100.

If you want analyze everything together, you superimpose all the pulses you've got.

If it doesn't make sense to put everything together, superimpose only the pulses from the same src_ip or from the same pair of src_ip and dst_ip.

And then run the FFT on those signals obtained through superposition and see if there any noticeable peaks in the frequency domain, or it all looks randomish, no outstanding peaks.

FFT runs in O(n*log(n)) time, where n is the number of signal samples.

I'm sure there must be better ways to do it, but it may be worth a try.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top