Question

I'm optimizing a matrix numerical hotspot.

Currently, I'm doing blocking and loop unrolling to improve performance. However, I deliberately avoid peeling the borders. Instead I let the blocking steps overflow, and of course, the algorithm then touches uninitialized values.

However, the matrix is generously pre-allocated to cope with the overflow so I am not actually illegally accessing a memory location.

I don't do peeling for several reasons:

  • Laziness
  • Performance hit due to the very bad locality of the peeling border case.
  • To avoid complex border peeling code.

However, I am wondering whether these overflowed accesses that touch uninitialized value(s) would actually cause a performance hit?

I predictably know where the uninitialized accesses happen and they are also reported via valgrind. I have also profiled the code using Intel's VTune and could not see any signs that would point to a degraded performance due to this.

Was it helpful?

Solution

Just to get pedantic stuff out of the way:

According to the standard, bad things can happen if you use uninitialized data. (The standard allows for "trap" values that could trigger exceptions.) But for all practical purposes, this probably doesn't apply here.


If you're dealing with integers, accessing and operating on uninitialized data will have no effect on performance. (aside from division, all operations are usually fixed latency)

For floating-point, there are two problems:

  1. Signalling NaNs
  2. Denormalized Values

Depending on the environment, signalling NaNs may trigger a hardware exception. So this would actually be a correctness issue, not just a performance issue.

It may counter-intuitive that denormal floats have anything to do with this. However, uninitialized data has a high probability of being denormalized.

And you really don't want to be messing with denormalized floating-point.

So if you're unlucky enough for the uninitialized values to have even one denormalized value, you can expect a nasty 100+ cycle penalty at the end of each loop iteration. Now depending on how large the loops are, this may or may not matter.

That said, why is uninitialized data prone to be denormalized? If the first few bits of a floating-point value are zero, then it is denormalized. It's that easy. If the data used to be a integer, or a 64-bit pointer... It'll be denormalized when reinterpreted as a floating-point value.


Suggestions:

  • Zero initialize the data. If it's too expensive, at least zero-initialize the end-points.
  • Avoid accessing the uninitialized data by putting in that clean up code. Something like Duff's Device might be appropriate. Though I generally prefer a set of binary reducing if-statements.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top