GCC inline assembly for SPARC: How to handle integer doubleword pairs?
-
26-05-2021 - |
Question
From what I understand, in SPARC, 32-bit integer quantities are stored in single registers and 64-bit integer quantities are stored in adjacent register pairs, with the even register containing the high 32 bits and the odd register containing the low 32 bits.
I need to write a few specialized SPARC inline assembly macros (inline assembly functions would be fine too) that deal with 64-bit integer doubleword pairs, and I can't figure out how to refer generically (using GCC extended inline assembly) to the two halves of the pair in my inline assembly. Though my assembly macros will be a little more complex than the MULTIPLY() macro shown below, the multiplication example, if it worked, would demonstrate how to deal with the two halves of a 64-bit doubleword pair. Can anyone tell me how to fix my MULTIPLY() macro?
In case it matters, I'm on a...
bash-2.03$ uname -a
SunOS [...] 5.8 Generic_117350-39 sun4u sparc SUNW,Ultra-80
Here is my trivial example program (in C):
#include <stdio.h>
//#include <stdint.h>
#define uint32 unsigned long int
#define uint64 unsigned long long int
#define MULTIPLY(r, a, b) /* (r = a * b) */ \
asm("umul %1, %2, %0;" /* unsigned mul */ \
: /* regs out */ "=h"(r) \
: /* regs in */ "r"(a), "r"(b));
#if 0
: /* clobbers */ "%y" );
#endif
int main(int argc, char** argv)
{
uint64 r;
uint32 a=0xdeadbeef, b=0xc0deba5e;
// loses the top 32 bits of the multiplication because the result is
// truncated at 32 bits which then gets assigned to the 64-bit 'r'...
r = a * b;
printf("u64=u32*u32 ----> r=a*b "
"----> 0x%016llx = 0x%x * 0x%x\n",
r, a, b);
// force promotion of 'a' to uint64 to get 64-bit multiplication
// (could cast either a or b as uint64, which one doesn't matter,
// as one explicit cast causes the other to be promoted as well)...
r = ((uint64)a) * b;
printf("u64=u64*u32 ----> r=((u64)a)*b "
"----> 0x%016llx = 0x%x * 0x%x\n",
r, a, b);
MULTIPLY(r, a, b);
printf("u64=u64*u32 ----> MULTIPLY(r,a,b) "
"----> 0x%016llx = 0x%x * 0x%x\n",
r, a, b);
return 0;
}
Which, when compiled with gcc-3.2-sun4u/bin/gcc -o mult -mcpu=ultrasparc mult.c
, produces this output:
u64=u32*u32 ----> r=a*b ----> 0x00000000d3c7c1c2 = 0xdeadbeef * 0xc0deba5e
u64=u64*u32 ----> r=((u64)a)*b ----> 0xa7c40bfad3c7c1c2 = 0xdeadbeef * 0xc0deba5e
u64=u64*u32 ----> MULTIPLY(r,a,b) ----> 0xd3c7c1c2deadbeef = 0xdeadbeef * 0xc0deba5e
I looked at the -S -fverbose-asm
output of gcc, and it's doing some strange shifting of the result register (which is even) & writing into the adjacent odd register. My problem is that I don't know how to generically refer to the adjacent odd register in the extended asm syntax. I thought perhaps the 'h' asm constraint in "=h"(r)
might have something to do with it, but I can't find any examples of how to use it.
Solution 2
First of all, thanks very much to Chris Dodd, torek, and gbulmer for your efforts & help. I managed to figure out how to do this with some comments I found here, reproduced in part (and slightly edited for form but not content) below:
Thread: RFE: "h" and "U" asm constraints and "H" and "L" modifiers.
[...]the following two contraints (quoted from gcc.info) for some v8+ ABI inline asm:
'h' 64-bit global or out register for the SPARC-V8+ architecture.
'U' Even register
The "U" is needed to allocate register(s) for ldd/std (it allocates an even+odd pair for a uint64_t). For instance:With or without "U" as a constraint, one can use "H" and "L" as modifiers in the template to get the High and Low registers of the pair used for a 64-bit value. The "h" constraint allocates a register of which, according to the v8+ ABI, one may safely use all 64bits (Global or Output regs only). The following (artificial) example demonstrates the "h" constraint and the "H" and "L" modifiers:void atomic64_set(volatile uint64_t *p, uint64_t v) { asm volatile ( "std %1, %0" : "=m"(*p) : "U"(v) ); }
Disclaimer: these examples were written on the spot and may not be correct with respect to early-clobber and similar issues.void ex_store64(uint64_t *p, uint64_t v) { register int tmp; // Don't say uint64_t or GCC thinks we want 2 regs asm volatile ( "sllx %H2,32,%1 \n\t" // tmp = HI32(v) << 32 "or %1,%L2,%1 \n\t" // tmp |= LO32(v) "stx %0, %1" // store 64-bit tmp : "=m"(*p), "=&h"(tmp) : "r"(v)); }
-Paul
Based on that, I was able to figure out how to rewrite my own 'MULTIPLY' macro from my problem statement:
#define MULTIPLY(r, a, b) /* r = a * b */\
asm("umul %1, %2, %L0;" /* umul a,b,r */\
"srlx %L0, 32, %H0;" \
: /* regs out */ "=r"(r) \
: /* regs in */ "r"(a), "r"(b));
/* re: clobbbers "none": I tried specifying :"%y"
* in various ways but GCC kept telling me
* there was no y, %y, or %%y register. */
My results are now:
u64=u32*u32 ----> r=a*b ----> 0x00000000d3c7c1c2 = 0xdeadbeef * 0xc0deba5e
u64=u64*u32 ----> r=((u64)a)*b ----> 0xa7c40bfad3c7c1c2 = 0xdeadbeef * 0xc0deba5e
u64=u64*u32 ----> MULTIPLY(r,a,b) ----> 0xa7c40bfad3c7c1c2 = 0xdeadbeef * 0xc0deba5e
OTHER TIPS
The umul
instruction multiplies two 32-bit (unsigned int) values in the lower halves of two registers, and puts the lower half of the 64-bit result in the destination register. The upper half of the result is written to the Y register. The upper half of the destination register is cleared. So what you probably want in order to use it is something like:
#define MULTIPLY(u, r, a, b) /* (u,r = a * b) */ \
asm("umul %2, %3, %0;" /* unsigned mul */ \
"rd %%y, %1;" /* get hi word of result */ \
: /* regs out */ "=r"(r), "=r"(u) \
: /* regs in */ "r" (a), "r" (b) \
: /* clobbers */ "%y" );
Note, however, that you're almost certainly better off just writing the multiply in C, using uint64_t
or unsigned long long
operands.
I think you're getting the old umul
instruction because you're using -mcpu=
instead of -march=
. Per the documentation, the latter has been changed to be synonymous with -mtune=
: generate instructions for "most generic architecture" but optimize them for use on the given architecture. So -mcpu=ultrasparc
means "generate for V8 sparc, but optimize for Ultrasparc". Using -march=ultrasparc
should get you a raw 64-bit multiply.
Edit: based on all the discussion and other answers, it appears that gcc 3.2 as configured does not work with
-m64
, which forces one to run in "v8plus" mode on Solaris 2 (32-bit address space and, for the most part, 32-bit registers, except for value stored in the %g
and %o
registers). A sufficiently newer gcc should allow compiling with -m64
, which will make the entire situation more or less moot. (And you can then add -march=niagara2
or whatever, as appropriate for your particular target hardware.) You may need to install a full set of binutils as well, per the following from the gcc 4.7.0 config/sparc/sparc.h
:
#if TARGET_CPU_DEFAULT == TARGET_CPU_v9
/* ??? What does Sun's CC pass? */
#define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
/* ??? It's not clear how other assemblers will handle this, so by default
use GAS. Sun's Solaris assembler recognizes -xarch=v8plus, but this case
is handled in sol2.h. */
#define ASM_CPU64_DEFAULT_SPEC "-Av9"
#endif
#if TARGET_CPU_DEFAULT == TARGET_CPU_ultrasparc
#define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
#define ASM_CPU64_DEFAULT_SPEC "-Av9a"
#endif
...
With all that in place you should just be able to multiply two 64-bit values to get a 64-bit result, in ordinary C code, without resorting to inline assembly.
(Otherwise you'll need something like the code you eventually came up with for gcc 3.2.)