Most Efficient way to set Register to 1 or (-1)

https://stackoverflow.com/questions/2826872

26-09-2019
|

Question

I am taking an assembly course now, and the guy who checks our home assignments is a very pedantic old-school optimization freak. For example he deducts 10% if he sees:

mov ax, 0

instead of:

xor ax,ax

even if it's only used once.

I am not a complete beginner in assembly programing but I'm not an optimization expert, so I need your help in something (might be a very stupid question but I'll ask anyway): if I need to set a register value to 1 or (-1) is it better to use:

mov ax, 1

or do something like:

xor ax,ax
inc ax

I really need a good grade, so I'm trying to get it as optimized as possible. ( I need to optimize both time and code size)

Solution

A quick google for 8086 instructions timings size turned up http://8086.tk/ which seems to have all the timings and sizes for the 8086 (and more) instruction sets.

No doubt you could find official Intel doco on the web with similar information.

For your specific question:

xor ax,ax
inc ax

takes 3+3=6 clock cycles and 2+1=3 bytes while

mov ax,1

takes 4 clock cycles and 3 bytes.

So the latter is better in that case.

But you need to talk to your educational institute about this guy. 10% for a simple thing like that beggars belief.

You should ask what should be done in the case where you have two possibilities, one faster and one shorter.

Then, once they've admitted that there are different ways to code depending on what you're trying to achieve, tell them that what you're trying to achieve is readability and maintainability and seriously couldn't give a flying leap about a wasted cycle or byte here or there^*a.

Optimisation is something you generally do if and when you have a performance problem, after a piece of code is in a near-complete state - it's almost always wasted effort when the code is still subject to a not-insignificant likelihood of change.

For what it's worth, sub ax,ax appears to be on par with xor ax,ax in terms of clock cycles and bytes, so maybe you could throw that into the mix next time to cause him some more work.

_{_{*a) No, don't really, but it's fun to vent occasionally :-)}}

OTHER TIPS

You're better off with

mov AX,1

on the 8086. If you're tracking register contents, you can possibly do better if you know that, for example, BX already has a 1 in it:

mov AX,BX

or if you know that AH is 0:

mov AL,1

etc.

Depending upon your circumstances, you may be able to get away with ...

 sbb ax, ax

The result will either be 0 if the carry flag is not set or -1 if the carry flag is set.

However, if the above example is not applicable to your situation, I would recommend the

xor  ax, ax
inc  ax

method. It should satisfy your professor for size. However, if your processor employs any pipe-lining, I would expect there to be some coupling-like delay between the two instructions (I could very well be wrong on that). If such a coupling exists, the speed could be improved slightly by reordering your instructions slightly to have another instruction between them (one that does not use ax).

Hope this helps.

I would use mov [e]ax, 1 under any circumstances. Its encoding is no longer than the hackier xor sequence, and I'm pretty sure it's faster just about anywhere. 8086 is just weird enough to be the exception, and as that thing is so slow, a micro-optimization like this would make most difference. But any where else: executing 2 "easy" instructions will always be slower than executing 1, especially if you consider data hazards and long pipelines. You're trying to read a register in the very next instruction after you modify it, so unless your CPU can bypass the result from stage N of the pipeline (where the xor is executing) to to stage N-1 (where the inc is trying to load the register, never mind adding 1 to its value), you're going to have stalls.

Other things to consider: instruction fetch bandwidth (moot for 16-bit code, both are 3 bytes); mov avoids changing flags (more likely to be useful than forcing them all to zero); depending on what values other registers might hold, you could perhaps do lea ax,[bx+1] (also 3 bytes, even in 32-bit code, no effect on flags); as others have said, sbb ax,ax could work too in circumstances - it's also shorter at 2 bytes.

When faced with these sorts of micro-optimizations you really should measure the alternatives instead of blindly relying even on processor manuals.

P.S. New homework: is xor bx,bx any faster than xor bx,cx (on any processor)?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow