Question

I'm trying to optimize this code in a way that is correct. What I mean by correct is... I would imagine there is a general approach to performing these optimizations such that if someone else looked at the code, they would be able to remove the optimizations.

C sample of code for readability...

int a = 1; // mapped to %l0
int b = 5; // mapped to %l1
int c = 0; // mapped to %l2
int d;     // mapped to %l3

while( a < b ) {
  c += a * b;
  ++a;
}

d = c * b;

SPARC assembly version...

  mov    %g0, %l2

  cmp    %l0, %l1
  bge    END_LOOP
  nop

LOOP:
  mov    %l0, %o0
  call   .mul
  mov    %l1, %o1       ! Fill delay slot with second argument
  add    %l2, %o0, %l2
  inc    %l0

  cmp    %l0, %l1
  bl     LOOP
  nop

END_LOOP:
  mov    %l2, %o0
  call   .mul           ! Fill delay sot with second argument
  mov    %l1, %o1

  mov    %o0, %l3

I can optimize the first part (not sure if correctly), but I'm not sure how to optimize the second part.

  mov    %g0, %l2

  cmp    %l0, %l1
  bge,a  END_LOOP       ! Annul branch to execute if branch is taken
  mov    %l2, %o0       ! Instruction at target

LOOP:
  mov    %l0, %o0
  call   .mul
  mov    %l1, %o1       ! Fill delay slot with second argument
  add    %l2, %o0, %l2
  inc    %l0

  cmp    %l0, %l1
  bl     LOOP
  nop

  mov    %l2, %o0       ! Move the instruction to above the target

END_LOOP:
  call   .mul           ! Fill delay sot with second argument
  mov    %l1, %o1

  mov    %o0, %l3

Any help with how to perform these optimizations would be very much appreciated.

Was it helpful?

Solution

In general, your approach is correct. Assuming you do not have a data dependency or transfer of control instruction immediately following the target, you can generally follow the following convention.

You did this, probably without realizing:

Go to the target of the branch, copy the instruction into the delay slot and annul the branch. As you stated, you annul the branch to prevent the instruction from executing if the branch isn't taken. Then move the instruction at the target above the label.

In your code, following the above steps, you would do the following:

I removed your comments so you can explicitly see what I changed.

  mov    %g0, %l2

  cmp    %l0, %l1
  bge,a  END_LOOP
  mov    %l2, %o0

  mov    %l0, %o0       ! 3. Move the instruction formerly after the loop
                        ! above the label
LOOP:
  [ mov    %l0, %o0 ]   ! Instruction was here

  call   .mul
  mov    %l1, %o1
  add    %l2, %o0, %l2
  inc    %l0

  cmp    %l0, %l1
  bl,a   LOOP           ! 1. Go to the target and copy that instruction into
                        ! they delay slot.
                        ! 2. Annul the branch
  mov    %l0, %o0       ! Instruction formerly after LOOP:

  mov    %l2, %o0

END_LOOP:
  call   .mul
  mov    %l1, %o1

  mov    %o0, %l3

If you examine the code carefully, you will see that the logic still holds and there is a systematic way to unwind the optimizations.

Regardless of whether or not you enter the loop, your code will still correctly execute the code following the loop.

This is the general approach to optimizing the code and is similar to what the compiler will do. The key is always to make sure a data dependency does not exist.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top