at&t asm inline c++ problem

https://stackoverflow.com/questions/1956379

21-09-2019
|

Question

My Code

const int howmany = 5046;
char buffer[howmany];
    asm("lea     buffer,%esi"); //Get the address of buffer
    asm("mov     howmany,%ebx");         //Set the loop number
    asm("buf_loop:");                      //Lable for beginning of loop
    asm("movb     (%esi),%al");             //Copy buffer[x] to al
    asm("inc     %esi");                   //Increment buffer address
    asm("dec     %ebx");                   //Decrement loop count
    asm("jnz     buf_loop");              //jump to buf_loop if(ebx>0)

My Problem

I am using the gcc compiler. For some reason my buffer/howmany variables are undefined in the eyes of my asm. I'm not sure why. I just want to move the beginning address of my buffer array into the esi register, loop it 'howmany' times while copying each element to the al register.

Solution

Are you using the inline assembler in gcc? (If not, in what other C++ compiler, exactly?)

If gcc, see the details here, and in particular this example:

    asm ("leal (%1,%1,4), %0"
         : "=r" (five_times_x)
         : "r" (x) 
         );

%0 and %1 are referring to the C-level variables, and they're listed specifically as the second (for outputs) and third (for inputs) parameters to asm. In your example you have only "inputs" so you'd have an empty second operand (traditionally one uses a comment after that colon, such as /* no output registers */, to indicate that more explicitly).

OTHER TIPS

The part that declares an array like that

int howmany = 5046;
char buffer[howmany];

is not valid C++. In C++ it is impossible to declare an array that has "variable" or run-time size. In C++ array declarations the size is always a compile-time constant.

If your compiler allows this array declaration, it means that it implements it as an extension. In that case you have to do your own research to figure out how it implements such a run-time sized array internally. I would guess that internally buffer will be implemented as a pointer, not as a true array. If my guess is correct and it is really a pointer, then the proper way to load the address of the array into esi might be

mov buffer,%esi

and not a lea, as in your code. lea will only work with "normal" compile-time sized arrays, but not with run-time sized arrays.

Another question is whether you really need a run-time sized array in your code. Could it be that you just made it so by mistake? If you simply change the howmany declaration to

const int howmany = 5046;

the array will turn into an "normal" C++ array and your code might start working as is (i.e. with lea).

All of those asm instructions need to be in the same asm statement if you want to be sure they're contiguous (without compiler-generated code between them), and you need to declare input / output / clobber operands or you will step on the compiler's registers.

You can't use lea or mov to/from a C variable name (except for global / static symbols which are actually defined in the compiler's asm output, but even then you usually shouldn't).

Instead of using mov instructions to set up inputs, ask the compiler to do it for you using input operand constraints. If the first or last instruction of a GNU C inline asm statement, usually that means you're doing it wrong and writing inefficient code.

And BTW, GNU C++ allows C99-style variable-length arrays, so howmany is allowed to be non-const and even set in a way that doesn't optimize away to a constant. Any compiler that can compile GNU-style inline asm will also support variable-length arrays.

How to write your loop properly

If this looks over-complicated, then https://gcc.gnu.org/wiki/DontUseInlineAsm. Write a stand-alone function in asm so you can just learn asm instead of also having to learn about gcc and its complex but powerful inline-asm interface. You basically have to know asm and understand compilers to use it correctly (with the right constraints to prevent breakage when optimization is enabled).

Note the use of named operands like %[ptr] instead of %2 or %%ebx. Letting the compiler choose which registers to use is normally a good thing, but for x86 there are letters other than "r" you can use, like "=a" for rax/eax/ax/al specifically. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, and also other links in the inline-assembly tag wiki.

I also used buf_loop%=: to append a unique number to the label, so if the optimizer clones the function or inlines it multiple places, the file will still assemble.

Source + compiler asm output on the Godbolt compiler explorer.

void ext(char *);

int foo(void) 
{
    int howmany = 5046;   // could be a function arg
    char buffer[howmany];
    //ext(buffer);

    const char *bufptr = buffer;  // copy the pointer to a C var we can use as a read-write operand
    unsigned char result;
    asm("buf_loop%=:  \n\t"                 // do {
        "   movb     (%[ptr]), %%al \n\t"   // Copy buffer[x] to al
        "   inc     %[ptr]        \n\t"
        "   dec     %[count]      \n\t"
        "   jnz     buf_loop      \n\t"      // } while(ebx>0)
       :   [res]"=a"(result)      // al = write-only output
         , [count] "+r" (howmany) // input/output operand, any register
         , [ptr] "+r" (bufptr)
       : // no input-only operands
       : "memory"   // we read memory that isn't an input operand, only pointed to by inputs
    );
    return result;
}

I used %%al as an example of how to write register names explicitly: Extended Asm (with operands) needs a double % to get a literal % in the asm output. You could also use %[res] or %0 and let the compiler substitute %al in its asm output. (And then you'd have no reason to use a specific-register constraint unless you wanted to take advantage of cbw or lodsb or something like that.) result is unsigned char, so the compiler will pick a byte register for it. If you want the low byte of a wider operand, you could use %b[count] for example.

This uses a "memory" clobber, which is inefficient. You don't need the compiler to spill everything to memory, only to make sure that the contents of buffer[] in memory matches the C abstract machine state. (This is not guaranteed by passing a pointer to it in a register).

gcc7.2 -O3 output:

    pushq   %rbp
    movl    $5046, %edx
    movq    %rsp, %rbp
    subq    $5056, %rsp
    movq    %rsp, %rcx         # compiler-emitted to satisfy our "+r" constraint for bufptr
    # start of the inline-asm block
    buf_loop18:  
       movb     (%rcx), %al 
       inc     %rcx        
       dec     %edx      
       jnz     buf_loop      
    # end of the inline-asm block

    movzbl  %al, %eax
    leave
    ret

Without a memory clobber or input constraint, leave appears before the inline asm block, releasing that stack memory before the inline asm uses the now-stale pointer. A signal-handler running at the wrong time would clobber it.

A more efficient way is to use a dummy memory operand which tells the compiler that the entire array is a read-only memory input to the asm statement. See get string length in inline GNU Assembler for more about this flexible-array-member trick for telling the compiler you read all of an array without specifying the length explicitly.

In C you can define a new type inside a cast, but you can't in C++, hence the using instead of a really complicated input operand.

int bar(unsigned howmany)
{
    //int howmany = 5046;
    char buffer[howmany];
    //ext(buffer);
    buffer[0] = 1;
    buffer[100] = 100;   // test whether we got the input constraints right

    //using input_t = const struct {char a[howmany];};  // requires a constant size
    using flexarray_t = const struct {char a; char x[];};
    const char *dummy;
    unsigned char result;
    asm("buf_loop%=:  \n\t"                 // do {
        "   movb     (%[ptr]), %%al \n\t"   // Copy buffer[x] to al
        "   inc     %[ptr]        \n\t"
        "   dec     %[count]      \n\t"
        "   jnz     buf_loop      \n\t"      // } while(ebx>0)
       : [res]"=a"(result)        // al = write-only output
         , [count] "+r" (howmany) // input/output operand, any register
         , "=r" (dummy)           // output operand in the same register as buffer input, so we can modify the register
       : [ptr] "2" (buffer)     // matching constraint for the dummy output
         , "m" (*(flexarray_t *) buffer)  // whole buffer as an input operand

           //, "m" (*buffer)        // just the first element: doesn't stop the buffer[100]=100 store from sinking past the inline asm, even if you used asm volatile
       : // no clobbers
    );
    buffer[100] = 101;
    return result;
}

I also used a matching constraint so buffer could be an input directly, and the output operand in the same register means we can modify that register. We got the same effect in foo() by using const char *bufptr = buffer; and then using a read-write constraint to tell the compiler that the new value of that C variable is what we leave in the register. Either way we leave a value in a dead C variable that goes out of scope without being read, but the matching constraint way can be useful for macros where you don't want to modify the value of your input (and don't need the type of your input: int dummy would work fine, too.)

The buffer[100] = 100; and buffer[100] = 101; assignments are there to show that they both appear in the asm, instead of being merged across the inline-asm (which does happen if you leave out the "m" input operand). IDK why the buffer[100] = 101; isn't optimized away; it's dead so it should be. Also note that asm volatile doesn't block this reordering, so it's not an alternative to a "memory" clobber or using the right constraints.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow