GAS assembly snippet divides by 0, not sure why

Question 1

division / multiplication instructions in x86 ... there's a few things wrong in this code:

You're using signed operands with the unsigned mul / div operations. The operations that you really perform therefore are:

the signed -566 (0xfffffdca as 2-complement 32bit) is interpreted as unsigned 4294958538
this is multiplied by 4096 resulting in 17592183726080 (0xfff:0xffdca000 in EDX:EAX). Notice the lower 32bit of that convert to -2318336 as you "expect"
The full 64bit value is divided by 400 but due to the fact that the upper 32bit are 0xfff, 4095), the result exceeds UINT32_MAX and the exception is raised.

If you clear the upper 32bits by inserting an xor %%edx,%%edx before the divl, the operation will succeed but it'll return you something you don't expect - namely, it divides 0xffdca000 (4292648960) by 400 resulting in 0xa3c066 (10731622) in EAX and the remainder, 0xa0 (160) in EDX.

That's "correct" as far as what you instructed the machine to do, but not what you expect. If you want to use signed numbers, you need imul / idiv instead.

The assembly can ultimately be simplified into the following:

__asm__ __volatile__ (
    "imull   %3              \n"
    "idivl   %4              \n"
    :   "=a"    (nRet),
        "=&d"   (nMod)
    :   "a"     (nNumber),
        "mr"    (nNumerator),
        "mr"    (nDenominator)
    :   "cc"
);

That's because gcc allows to specify which registers to use as input / output, so no data moves are necessary at all here. Also, the "m" constraint alone creates suboptimal code on 64bit as it forces the arguments onto the stack; give it an alternative and the generated code will be better.

Edit: just changed the nMod constraint to "=&d"(nMod); it needs to be what gcc calls an "early clobber". This means that the specified output register is overwritten before all input operands are consumed/used, and tells the compiler not to pass inputs (the (nDenominator), in particular) in EDX. Otherwise, were that to happen, it would cause an "interesting" failure mode. This is not an issue if you only use the "m" for nNumerator/nDenominator but once registers are allowed, one better be careful.

Edit2: Also note that the above code isn't proof against overflow exceptions of course. You can still call it like MulDivRound(INT32_MAX, 4, 2) to trigger those. Legitimately / by the way these instructions are designed. If you must make sure that doesn't happen, you've got to add code which compares the denominator against EDX/RDX before the [i]div and handle the case where it's smaller.

Question 2

You do not get a division by zero error, but an overflow error.

divl divides rdx:rax / operand (higher word in rdx) and stores the result in eax and the remainder in edx.

In your code you end up with rdx=4095 and rax=0, so you try to divide 75539416981840613867520 / 400 which results in 188848542454601534668 remainder 320.

188848542454601534668 is 0x 000a 3ccc cccc cccc cccc which does not fit in the 32 bit result register eax, hence the overflow error.

You need to make sure that rax contains your value 4095 and that rdx=0. This gives the proper result in rax (result) and rdx (remainder):

rax            0xa      10
rdx            0x5f     95