How can I multiply two 64-bit numbers using x86 assembly language?

https://stackoverflow.com/questions/87771

x86
assembly

01-07-2019
|

Question

How would I go about...

multiplying two 64-bit numbers
multiplying two 16-digit hexadecimal numbers

...using Assembly Language.

I'm only allowed to use registers %eax, %ebx, %ecx, %edx, and the stack.

EDIT: Oh, I'm using ATT Syntax on the x86
EDIT2: Not allowed to decompile into assembly...

Solution

Use what should probably be your course textbook, Randall Hyde's "The Art of Assembly Language".

See 4.2.4 - Extended Precision Multiplication

Although an 8x8, 16x16, or 32x32 multiply is usually sufficient, there are times when you may want to multiply larger values together. You will use the x86 single operand MUL and IMUL instructions for extended precision multiplication ..

Probably the most important thing to remember when performing an extended precision multiplication is that you must also perform a multiple precision addition at the same time. Adding up all the partial products requires several additions that will produce the result. The following listing demonstrates the proper way to multiply two 64 bit values on a 32 bit processor ..

(See the link for full assembly listing and illustrations.)

OTHER TIPS

If this was 64x86,

function(x, y, *lower, *higher)
movq %rx,%rax     #Store x into %rax
mulq %y           #multiplies %y to %rax
#mulq stores high and low values into rax and rdx.
movq %rax,(%r8)   #Move low into &lower
movq %rdx,(%r9)   #Move high answer into &higher

Since you're on x86 you need 4 mull instructions. Split the 64bit quantities into two 32bit words and multiply the low words to the lowest and 2nd lowest word of the result, then both pairs of low and high word from different numbers (they go to the 2nd and 3rd lowest word of the result) and finally both high words into the 2 highest words of the result. Add them all together not forgetting to deal with carry. You didn't specify the memory layout of the inputs and outputs so it's impossible to write sample code.

This code assumes you want x86 (not x64 code), that you probably only want a 64 bit product, and that you don't care about overflow or signed numbers. (A signed version is similar).

MUL64_MEMORY:
     mov edi, val1high
     mov esi, val1low
     mov ecx, val2high
     mov ebx, val2low
MUL64_EDIESI_ECXEBX:
     mov eax, edi
     mul ebx
     xch eax, ebx  ; partial product top 32 bits
     mul esi
     xch esi, eax ; partial product lower 32 bits
     add ebx, edx
     mul ecx
     add ebx, eax  ; final upper 32 bits
; answer here in EBX:ESI

This doesn't honor the exact register constraints of OP, but the result fits entirely in the registers offered by the x86. (This code is untested, but I think it's right).

[Note: I transferred (my) this answer from another question that got closed, because NONE of the other "answers" here directly answered the question].

It depends what language you are using. From what I remember from learning MIPS assembly, there is a Move From High command and a Move From Lo command, or mflo and mfhi. mfhi stores the top 64bits while mflo stores the lower 64bits of the total number.

ah assembly, been awhile since i've used it. so i'm assuming the real problem here is that the microcontroller (what i used to write code for in assembly anyways) you're working on doesn't have 64 bit registers? if that's the case, you're going to have the break the numbers you're working with apart and perform multiple multiplications with the pieces.

this sounds like it's a homework assignment from the way you've worded it, so i'm not gonna spell it out much further :P

Just do normal long multiplication, as if you were multiplying a pair of 2-digit numbers, except each "digit" is really a 32-bit integer. If you're multiplying two numbers at addresses X and Y and storing the result in Z, then what you want to do (in pseudocode) is:

Z[0..3] = X[0..3] * Y[0..3]
Z[4..7] = X[0..3] * Y[4..7] + X[4..7] * Y[0..3]

Note that we're discarding the upper 64 bits of the result (since a 64-bit number times a 64-bit number is a 128-bit number). Also note that this is assuming little-endian. Also, be careful about a signed versus an unsigned multiply.

Find a C compiler that supports 64bit (GCC does IIRC) compile a program that does just that, then get the disassembly. GCC can spit it out on it's own and you can get it out of object file with the right tools.

OTOH their is a 32bX32b = 64b op on x86

a:b * c:d = e:f
// goes to
e:f = b*d;
x:y = a*d;  e += x;
x:y = b*c;  e += x;

everything else overflows

(untested)

Edit Unsigned only

I'm betting you're a student, so see if you can make this work: Do it word by word, and use bit shifts. Think up the most efficient solution. Beware of the sign bit.

If you want 128 mode try this...

__uint128_t AES::XMULTX(__uint128_t TA,__uint128_t TB)
{
    union
    {
        __uint128_t WHOLE;
        struct
        {
            unsigned long long int LWORDS[2];
        } SPLIT;
    } KEY;
    register unsigned long long int __XRBX,__XRCX,__XRSI,__XRDI;
    __uint128_t RESULT;

    KEY.WHOLE=TA;
    __XRSI=KEY.SPLIT.LWORDS[0];
    __XRDI=KEY.SPLIT.LWORDS[1];
    KEY.WHOLE=TB;
    __XRBX=KEY.SPLIT.LWORDS[0];
    __XRCX=KEY.SPLIT.LWORDS[1];
    __asm__ __volatile__(
                 "movq          %0,             %%rsi           \n\t"       
                 "movq          %1,             %%rdi           \n\t"
                 "movq          %2,             %%rbx           \n\t"
                 "movq          %3,             %%rcx           \n\t"
                 "movq          %%rdi,          %%rax           \n\t"
                 "mulq          %%rbx                           \n\t"
                 "xchgq         %%rbx,          %%rax           \n\t"
                 "mulq          %%rsi                           \n\t"
                 "xchgq         %%rax,          %%rsi           \n\t"
                 "addq          %%rdx,          %%rbx           \n\t"
                 "mulq          %%rcx                           \n\t"
                 "addq          %%rax,          %%rbx           \n\t"
                 "movq          %%rsi,          %0              \n\t"
                 "movq          %%rbx,          %1              \n\t"
                 : "=m" (__XRSI), "=m" (__XRBX)
                 : "m" (__XRSI),  "m" (__XRDI), "m" (__XRBX), "m" (__XRCX)
                 : "rax","rbx","rcx","rdx","rsi","rdi"
                 );
    KEY.SPLIT.LWORDS[0]=__XRSI;
    KEY.SPLIT.LWORDS[1]=__XRBX;
    RESULT=KEY.WHOLE;
    return RESULT;
}

If you want 128bit multiplication then this should work this is in AT&T format.

__uint128_t FASTMUL128(const __uint128_t TA,const __uint128_t TB)
{
    union
    {
        __uint128_t WHOLE;
        struct
        {
            unsigned long long int LWORDS[2];
        } SPLIT;
    } KEY;
    register unsigned long long int __RAX,__RDX,__RSI,__RDI;
    __uint128_t RESULT;

KEY.WHOLE=TA;
__RAX=KEY.SPLIT.LWORDS[0];
__RDX=KEY.SPLIT.LWORDS[1];
KEY.WHOLE=TB;
__RSI=KEY.SPLIT.LWORDS[0];
__RDI=KEY.SPLIT.LWORDS[1];
__asm__ __volatile__(
    "movq           %0,                             %%rax                   \n\t"
    "movq           %1,                             %%rdx                   \n\t"
    "movq           %2,                             %%rsi                   \n\t"
    "movq           %3,                             %%rdi                   \n\t"
    "movq           %%rsi,                          %%rbx                   \n\t"
    "movq           %%rdi,                          %%rcx                   \n\t"
    "movq           %%rax,                          %%rsi                   \n\t"
    "movq           %%rdx,                          %%rdi                   \n\t"
    "xorq           %%rax,                          %%rax                   \n\t"
    "xorq           %%rdx,                          %%rdx                   \n\t"
    "movq           %%rdi,                          %%rax                   \n\t"
    "mulq           %%rbx                                                   \n\t"
    "xchgq          %%rbx,                          %%rax                   \n\t"
    "mulq           %%rsi                                                   \n\t"
    "xchgq          %%rax,                          %%rsi                   \n\t"
    "addq           %%rdx,                          %%rbx                   \n\t"
    "mulq           %%rcx                                                   \n\t"
    "addq           %%rax,                          %%rbx                   \n\t"
    "movq           %%rsi,                          %%rax                   \n\t"
    "movq           %%rbx,                          %%rdx                   \n\t"
    "movq           %%rax,                          %0                      \n\t"
    "movq           %%rdx,                          %1                      \n\t"
    "movq           %%rsi,                          %2                      \n\t"
    "movq           %%rdi,                          %3                      \n\t"
    : "=m"(__RAX),"=m"(__RDX),"=m"(__RSI),"=m"(__RDI)
    :  "m"(__RAX), "m"(__RDX), "m"(__RSI), "m"(__RDI)
    : "rax","rbx","ecx","rdx","rsi","rdi"
);
KEY.SPLIT.LWORDS[0]=__RAX;
KEY.SPLIT.LWORDS[1]=__RDX;
RESULT=KEY.WHOLE;
return RESULT;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow