This is just algebra, not an equation or a theory.
If you already know bytes/second, and clock speed (cycles/second), then
(bytes/second) / (cycles/second) => bytes/cycle
1 / (bytes/cycle) => cycles/byte
If you don't know bytes per second, you can calculate it by:
- get a high-resolution timestamp T1 suitable for this kind of measurement
- run your algorithm N times over B bytes
- get another timestamp T2
- subtract the timestamps one from the other, to give the elapsed time E = T2 - T1
- you have now processed (N *B) bytes in E time units
- repeat several times
- if your measurements are unstable, or your duration E uncomfortably close to zero, or suspiciously close to some system timer granularity, increase N and/or B and try again. Actually, do this a few times anyway to confirm you get a linear relationship between bytes processed and time taken
- scale your time units (nanoseconds, microseconds, whatever they are) into seconds, if that's how you want to display the result
Note that if your "timestamp" above is actually a cycle counter, you can skip the cycles/second stage. Otherwise, you can just read off the CPU frequency from the system/hardware information tool for your platform.
For POSIX, a sensible timer might be clock_gettime(CLOCK_THREAD_CPUTIME_ID,...)
, for example. You should be able to find example code for rdtsc
, documentation for the best Windows timing function etc. by searching.
As for actually taking the measurements, there are good suggestions in the comments. You need to:
- take a large (enough) number of samples for it to be reliable
- ideally with nothing else contending for resources, if not with FIFO/realtime scheduling
- either making sure any CPU clock scaling is turned off, or discard the first samples where it was warming up