GCC inline assembly for SPARC: How to handle integer doubleword pairs?

https://stackoverflow.com/questions/9879193

26-05-2021
|

Question

From what I understand, in SPARC, 32-bit integer quantities are stored in single registers and 64-bit integer quantities are stored in adjacent register pairs, with the even register containing the high 32 bits and the odd register containing the low 32 bits.

I need to write a few specialized SPARC inline assembly macros (inline assembly functions would be fine too) that deal with 64-bit integer doubleword pairs, and I can't figure out how to refer generically (using GCC extended inline assembly) to the two halves of the pair in my inline assembly. Though my assembly macros will be a little more complex than the MULTIPLY() macro shown below, the multiplication example, if it worked, would demonstrate how to deal with the two halves of a 64-bit doubleword pair. Can anyone tell me how to fix my MULTIPLY() macro?

In case it matters, I'm on a...

bash-2.03$ uname -a
SunOS [...] 5.8 Generic_117350-39 sun4u sparc SUNW,Ultra-80

Here is my trivial example program (in C):

#include <stdio.h>
//#include <stdint.h>
#define uint32 unsigned long int
#define uint64 unsigned long long int


#define MULTIPLY(r, a, b)  /* (r = a * b) */   \
   asm("umul %1, %2, %0;"  /* unsigned mul */  \
       : /* regs out */  "=h"(r)               \
       : /* regs in  */  "r"(a),   "r"(b));
#if 0
       : /* clobbers */  "%y" );
#endif


int main(int argc, char** argv)
{
   uint64 r;
   uint32 a=0xdeadbeef, b=0xc0deba5e;

   // loses the top 32 bits of the multiplication because the result is
   // truncated at 32 bits which then gets assigned to the 64-bit 'r'...
   r = a * b;
   printf("u64=u32*u32  ---->  r=a*b           "
          "---->  0x%016llx = 0x%x * 0x%x\n",
          r, a, b);

   // force promotion of 'a' to uint64 to get 64-bit multiplication
   // (could cast either a or b as uint64, which one doesn't matter,
   // as one explicit cast causes the other to be promoted as well)...
   r = ((uint64)a) * b;
   printf("u64=u64*u32  ---->  r=((u64)a)*b    "
          "---->  0x%016llx = 0x%x * 0x%x\n",
          r, a, b);

   MULTIPLY(r, a, b);
   printf("u64=u64*u32  ---->  MULTIPLY(r,a,b) "
          "---->  0x%016llx = 0x%x * 0x%x\n",
          r, a, b);

   return 0;
}

Which, when compiled with gcc-3.2-sun4u/bin/gcc -o mult -mcpu=ultrasparc mult.c, produces this output:

u64=u32*u32  ---->  r=a*b           ---->  0x00000000d3c7c1c2 = 0xdeadbeef * 0xc0deba5e  
u64=u64*u32  ---->  r=((u64)a)*b    ---->  0xa7c40bfad3c7c1c2 = 0xdeadbeef * 0xc0deba5e  
u64=u64*u32  ---->  MULTIPLY(r,a,b) ---->  0xd3c7c1c2deadbeef = 0xdeadbeef * 0xc0deba5e

I looked at the -S -fverbose-asm output of gcc, and it's doing some strange shifting of the result register (which is even) & writing into the adjacent odd register. My problem is that I don't know how to generically refer to the adjacent odd register in the extended asm syntax. I thought perhaps the 'h' asm constraint in "=h"(r) might have something to do with it, but I can't find any examples of how to use it.

Solution 2

First of all, thanks very much to Chris Dodd, torek, and gbulmer for your efforts & help. I managed to figure out how to do this with some comments I found here, reproduced in part (and slightly edited for form but not content) below:

Thread: RFE: "h" and "U" asm constraints and "H" and "L" modifiers.
[...]the following two contraints (quoted from gcc.info) for some v8+ ABI inline asm:
'h' 64-bit global or out register for the SPARC-V8+ architecture.
'U' Even register
The "U" is needed to allocate register(s) for ldd/std (it allocates an even+odd pair for a uint64_t). For instance:
    void atomic64_set(volatile uint64_t *p, uint64_t v) {
        asm volatile ( "std %1, %0" : "=m"(*p) : "U"(v) );
    }
With or without "U" as a constraint, one can use "H" and "L" as modifiers in the template to get the High and Low registers of the pair used for a 64-bit value. The "h" constraint allocates a register of which, according to the v8+ ABI, one may safely use all 64bits (Global or Output regs only). The following (artificial) example demonstrates the "h" constraint and the "H" and "L" modifiers:
    void ex_store64(uint64_t *p, uint64_t v) {  
       register int tmp; // Don't say uint64_t or GCC thinks we want 2 regs  
       asm volatile (  
          "sllx %H2,32,%1 \n\t" // tmp = HI32(v) << 32  
          "or %1,%L2,%1 \n\t" // tmp |= LO32(v)  
          "stx %0, %1" // store 64-bit tmp  
          :  "=m"(*p),  "=&h"(tmp)  :  "r"(v));  
      }
Disclaimer: these examples were written on the spot and may not be correct with respect to early-clobber and similar issues.
-Paul

Based on that, I was able to figure out how to rewrite my own 'MULTIPLY' macro from my problem statement:

#define MULTIPLY(r, a, b)     /* r = a * b          */\
   asm("umul %1, %2, %L0;"    /* umul a,b,r         */\
       "srlx %L0, 32, %H0;"                           \
       : /* regs out */   "=r"(r)                     \
       : /* regs in  */   "r"(a),   "r"(b));
       /* re: clobbbers "none": I tried specifying :"%y"
        *     in various ways but GCC kept telling me
        *     there was no y, %y, or %%y register. */

My results are now:

u64=u32*u32  ---->  r=a*b           ---->  0x00000000d3c7c1c2 = 0xdeadbeef * 0xc0deba5e  
u64=u64*u32  ---->  r=((u64)a)*b    ---->  0xa7c40bfad3c7c1c2 = 0xdeadbeef * 0xc0deba5e  
u64=u64*u32  ---->  MULTIPLY(r,a,b) ---->  0xa7c40bfad3c7c1c2 = 0xdeadbeef * 0xc0deba5e

OTHER TIPS

The umul instruction multiplies two 32-bit (unsigned int) values in the lower halves of two registers, and puts the lower half of the 64-bit result in the destination register. The upper half of the result is written to the Y register. The upper half of the destination register is cleared. So what you probably want in order to use it is something like:

#define MULTIPLY(u, r, a, b) /* (u,r = a * b) */     \
asm("umul %2, %3, %0;"   /* unsigned mul */          \
    "rd %%y, %1;"        /* get hi word of result */ \
    : /* regs out */  "=r"(r), "=r"(u)               \
    : /* regs in  */  "r" (a), "r" (b)               \
    : /* clobbers */  "%y" );

Note, however, that you're almost certainly better off just writing the multiply in C, using uint64_t or unsigned long long operands.

I think you're getting the old umul instruction because you're using -mcpu= instead of -march=. Per the documentation, the latter has been changed to be synonymous with -mtune=: generate instructions for "most generic architecture" but optimize them for use on the given architecture. So -mcpu=ultrasparc means "generate for V8 sparc, but optimize for Ultrasparc". Using -march=ultrasparc should get you a raw 64-bit multiply.

Edit: based on all the discussion and other answers, it appears that gcc 3.2 as configured does not work with -m64, which forces one to run in "v8plus" mode on Solaris 2 (32-bit address space and, for the most part, 32-bit registers, except for value stored in the %g and %o registers). A sufficiently newer gcc should allow compiling with -m64, which will make the entire situation more or less moot. (And you can then add -march=niagara2 or whatever, as appropriate for your particular target hardware.) You may need to install a full set of binutils as well, per the following from the gcc 4.7.0 config/sparc/sparc.h:

#if TARGET_CPU_DEFAULT == TARGET_CPU_v9
/* ??? What does Sun's CC pass?  */
#define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
/* ??? It's not clear how other assemblers will handle this, so by default
   use GAS.  Sun's Solaris assembler recognizes -xarch=v8plus, but this case
   is handled in sol2.h.  */
#define ASM_CPU64_DEFAULT_SPEC "-Av9"
#endif
#if TARGET_CPU_DEFAULT == TARGET_CPU_ultrasparc
#define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
#define ASM_CPU64_DEFAULT_SPEC "-Av9a"
#endif
...

With all that in place you should just be able to multiply two 64-bit values to get a 64-bit result, in ordinary C code, without resorting to inline assembly.

(Otherwise you'll need something like the code you eventually came up with for gcc 3.2.)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow