Getting cycles per byte for my algorithm?

Question

This is just algebra, not an equation or a theory.

If you already know bytes/second, and clock speed (cycles/second), then

(bytes/second) / (cycles/second) => bytes/cycle
1 / (bytes/cycle) => cycles/byte

If you don't know bytes per second, you can calculate it by:

get a high-resolution timestamp T₁ suitable for this kind of measurement
run your algorithm N times over B bytes
get another timestamp T₂
subtract the timestamps one from the other, to give the elapsed time E = T₂ - T₁
you have now processed (N *B) bytes in E time units
repeat several times
if your measurements are unstable, or your duration E uncomfortably close to zero, or suspiciously close to some system timer granularity, increase N and/or B and try again. Actually, do this a few times anyway to confirm you get a linear relationship between bytes processed and time taken
scale your time units (nanoseconds, microseconds, whatever they are) into seconds, if that's how you want to display the result

Note that if your "timestamp" above is actually a cycle counter, you can skip the cycles/second stage. Otherwise, you can just read off the CPU frequency from the system/hardware information tool for your platform.

For POSIX, a sensible timer might be clock_gettime(CLOCK_THREAD_CPUTIME_ID,...), for example. You should be able to find example code for rdtsc, documentation for the best Windows timing function etc. by searching.

As for actually taking the measurements, there are good suggestions in the comments. You need to:

take a large (enough) number of samples for it to be reliable
ideally with nothing else contending for resources, if not with FIFO/realtime scheduling
either making sure any CPU clock scaling is turned off, or discard the first samples where it was warming up