I see just 3 fairly simple problems there:
BE _next ; if statement by "branch"-cmd
...
sub R0, R0, #1 ; loop counting
BLPL _for_loop ; pl = if positive or zero
BEQ
, notBE
- condition codes are always 2 letters.SUB
alone won't update the flags - you need the suffix to say so i.e.SUBS
.BLPL
would branch and link, thus overwriting your return address - you wantBPL
. Actually,BLPL
wouldn't assemble here anyway, since in Thumb a conditionalBL
would need anIT
to set it up (unless of course your assembler is clever enough to insert one automatically).
Edit: there's also of course a more general issue with the use of R4
in both the original code and my examples below - if you're interfacing with C code the original value must be preserved across the function call and restored afterwards (R0
-R3
are designated argument/scratch registers and can be freely modified). If you're in pure assembly however you don't necessarily need to follow a standard calling convention so can be more flexible.
Now, that's a very literal representation of the C code, and doesn't make best use of the instruction set - in particular the indexed addressing modes. One of the attractions of assembly programming is having complete control of the instructions, so how can we make it worth our while?
First, let's make the C code look a little more like the assembly we want:
int main_compare (int nbytes, char *pmem1, char *pmem2){
while(nbytes-- > 0) {
if(*pmem1++ != *pmem2++) {
return 0;
}
}
return 1;
}
Now that that shows our intent more clearly, let's play compiler:
byte_cmp_loop PROC
; assuming: r0 = nbytes, r1=pmem1, r2 = pmem2
_loop:
SUBS R0, R0, #1 ; Decrement nbytes and set flags based on the result
BMI _finished ; If nbytes is now negative, it was 0, so we're done
LDRB R3, [R1], #1 ; Load from the address in R1, then add 1 to R1
LDRB R4, [R2], #1 ; ditto for R2
CMP R3, R4 ; If they match...
BEQ _loop ; then continue round the loop
MOV R0, #0 ; else give up and return zero
BX LR
_finished:
MOV R0, #1 ; Success!
BX LR
ENDP
And that's nearly 25% fewer instructions! Now if we pull in another instruction set feature - conditional execution - and relax the requirements slightly, without breaking C semantics, it gets smaller still:
byte_cmp_loop PROC
; assuming: r0 = nbytes, r1=pmem1, r2 = pmem2
_loop:
SUBS R0, R0, #1 ; In C zero is false and any nonzero value is true, so
; when R0 becomes -1 to trigger this branch, we can just
; return that to indicate success
IT MI ; Make the following instruction conditional on 'minus'
BXMI LR
LDRB R3, [R1], #1
LDRB R4, [R2], #1
CMP R3, R4
BEQ _loop
MOVS R0, #0 ; Using MOVS rather than MOV to get a 16-bit encoding,
; since updating the flags won't matter at this point
BX LR
ENDP
assembling to a meagre 22 bytes, that's nearly 40% less code than we started with :D