Question

What is the fastest way to set a single memory cell to zero in x86? Typically the way I do it is this:

C745D800000000  MOV [ebp-28], 0

As you can see this has a pretty chunky encoding since it is using all 4 bytes for the constant. With a plain register I can use MVZE which is more compact, but MVZE does not work with memory.

I was thinking maybe clear a register, then MOV the register value to the memory. Then, it would be two instructions, but only 5 bytes total instead of the one 7-byte instruction above. Following the rule "if its shorter, its usually faster", this might be preferable.

Was it helpful?

Solution

Unfortunately, what you have written here is the only way to "directly" zero out a memory cell. Of course, XORing out a register and then moving it to some memory location would also work, but I don't know if that would be any faster.

If you happen to have a register whose value is zero and you're sure of it, then by all means use it. Otherwise, just stick with the mov [ebp-28], 0. Keep in mind that mem, imm operands are known to be one of the slowest : if you profile your code and find out that this is a bottleneck, try initializing a register to zero at the beginning of your function (or whatever) and then using it throughout the code, as a sort of a predefined constant.

OTHER TIPS

If you expect your data to be out of the cache, and you don't expect to access it again soon, MASKMOVDQU might be the fastest way. This allows you to write one or more bytes without affecting surrounding bytes and without waiting for a request-for-ownership request to bring the associated cache line into memory.

Essentially, the write is sent directly down to memory, rather than the other way around. Since the CPU interacts with memory in cache-line sized chunks, what is happening under the covers is that the cache line containing the write is send down, along with a mask indicating which bytes are actually be updated. Either at the memory controller, L3 cache or in the memory itself, the bytes to be written are then merged with the bytes that should be left alone.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top