Pregunta

I want to use inline asm for ARMv7 with clang 3.4, in order to write low level code that accesses the CPUs control registers. As a test, I wrote a program that reads from a register, conditionally fiddles with some bits, and writes back the new value.

However, when I look at the produced machine code the entire bit-fiddling has been optimized away. Apparently I have not used the right asm constraints to tell clang that the result of writing to the register depends on what is being written. (I only used a simple "volatile" modifier).

How should I write the inline asm code so that clang generates correct asm? Here is the code test.c

typedef unsigned int uint32_t;

// code that reads and writes the ID_PFR1 register

uint32_t read_ID_PFR1() {
  uint32_t a;
  asm volatile ("mrc     p15, 0, %0, c0, c1, 1" : : "r"(a) : );
  return a;
}

void write_ID_PFR1(uint32_t a) {
  asm volatile ("mcr     p15, 0, %0, c0, c1, 1" :"=r"(a) :  :  );
}

// regular c code that modifies the register

uint32_t foo(uint32_t b) {
  uint32_t a;
  a = read_ID_PFR1();
  write_ID_PFR1(b);
  return a+b;
}

void bit_fiddle() {
  uint32_t a;
  a = read_ID_PFR1();
  if ((a & 0x3) == 1) {
    a |= 1<<2;
  }
  a |= 1<<3;
  write_ID_PFR1(a);
}

I compiled it with

clang-3.4 -target armv7a-none-eabi test.c -o test -O3

This is the resulting machine code

$ arm-linux-gnueabi-objdump -S test

test:     file format elf32-littlearm


Disassembly of section .text:

00000000 <read_ID_PFR1>:
   0:   ee100f31    mrc 15, 0, r0, cr0, cr1, {1}
   4:   e12fff1e    bx  lr

00000008 <write_ID_PFR1>:
   8:   ee000f31    mcr 15, 0, r0, cr0, cr1, {1}
   c:   e12fff1e    bx  lr

00000010 <foo>:
  10:   ee100f31    mrc 15, 0, r0, cr0, cr1, {1}
  14:   ee000f31    mcr 15, 0, r0, cr0, cr1, {1}
  18:   e12fff1e    bx  lr

0000001c <bit_fiddle>:
  1c:   ee100f31    mrc 15, 0, r0, cr0, cr1, {1}
  20:   ee000f31    mcr 15, 0, r0, cr0, cr1, {1}
  24:   e12fff1e    bx  lr

As you can see in <bit_fiddle>, nothing is left between the mrc and mcr instructions. And also see how foo fails to add together a+b in the produced machine code.

¿Fue útil?

Solución

You are close. The "=" means that the operand is write-only:

While using constraints, for more precise control over the effects of constraints, GCC provides us with constraint modifiers. Mostly used constraint modifiers are

"=" : Means that this operand is write-only for this instruction; the previous value is discarded and replaced by output data. "&" : Means that this operand is an earlyclobber operand, which is modified before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is used as an input operand or as part of any memory address. An input operand can be tied to an earlyclobber operand if its only use as an input occurs before the early result is written.

The input and output is determined by the order in the list separated with :

  asm ( assembler template 
       : output operands                  /* optional */
       : input operands                   /* optional */
       : list of clobbered registers      /* optional */
       );
  1. read gcc inline assembly HOWTO.
  2. Copy content of C variable into a register (GCC)

Otros consejos

I'm was using the "r" and "=r" constraints in the wrong way. write should have an input constraint, and read should have an output constraint.

This is the way to do it:

uint32_t read_ID_PFR1() {
  uint32_t a;
  asm volatile ("mrc     p15, 0, %0, c0, c1, 1" : "=r"(a) : : );
  return a;
}

void write_ID_PFR1(uint32_t a) {
  asm volatile ("mcr     p15, 0, %0, c0, c1, 1" : : "r"(a) :  );
}

Here is the code produced for bit_fiddle:

00000020 <bit_fiddle>:
  20:   ee100f31    mrc 15, 0, r0, cr0, cr1, {1}
  24:   e2001003    and r1, r0, #3
  28:   e3510001    cmp r1, #1
  2c:   03800004    orreq   r0, r0, #4
  30:   e3800008    orr r0, r0, #8
  34:   ee000f31    mcr 15, 0, r0, cr0, cr1, {1}
  38:   e12fff1e    bx  lr

Pretty nice...

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top