Delphi 64 bits asm compiling error

Question 1

The main problem with this code, for porting to x64, is that it uses the wrong floating point unit. On x64 floating point is done on the SSE unit.

Yes, the x87 unit is still there, but it is slow in comparison. Another problem is that the x64 ABI assumes that you will use the SSE unit. Parameters arrive in SSE registers. Floating point values are returned in an SSE register. It's pointless (not to mention rather hard work and time consuming) to transfer values between SSE and x87 units. What's more, floating point control, exception masks, are initialised for the SSE unit, but are you sure that they will be correctly set for the SSE unit.

So, in view of all this, I strongly advise you to make sure that all your floating point code is executed on the SSE unit under x64. I think that the only time that a case could be made for using the x87 register is for an algorithm that requires the 10 byte extended type that is supported on x87 but not SSE. That is not the case here.

Now, porting to the SSE unit is not as simple as translating the opcodes to SSE equivalents. That's because the SSE floating unit has much less capability built-in. For instance, there are no trigonometric functions included in the SSE opcodes.

So, the right way to deal with this is to switch to using Pascal code. These functions can be replaced by Math.ArcTan2 and Math.ArcSin respectively.

To elaborate on this, let's look at what is involved in doing the calculation on the x87 unit, under x64. The code for ArcSin goes like this:

function ArcSin(X: Double): Double;
// to be 100% clear, do **not** use this code
asm
  movq [rsp-8], xmm0     // X arrives in xmm0, move it to stack memory
  fld qword ptr [rsp-8]  // now load X into the x87 unit
  fld st(0)              // calculation code exactly as before
  fmul st(0), st(0)
  fld1
  fsubrp st(1), st(0)
  fsqrt
  fpatan
  fwait
  fstp qword ptr [rsp-8] // but now we need to move the return value
  movq xmm0, [rsp-8]     // back into xmm0, again via the stack
end;

Points to note:

The x64 ABI means that the input parameter arrives in xmm0. We cannot load that directly into the x87 unit. So we have to transfer from xmm0 to scratch memory on the stack, and then load from there into the x87 unit.
And we have to do similar when returning the value. The value is returned in xmm0, as specified by the ABI. So we need to move out of the x87 unit, to scratch stack memory, and then load into xmm0.
We've completely ignored floating point control word: exception masking, precision and rounding control etc. If you were to do this you'd need to put together a mechanism to make sure that the x87 unit's control word was handled in a sane manner.

So, perhaps this can serve as a warning to future visitors who wish to use the x87 to perform floating point arithmetic under x64.

Question 2

x64 still support classic floating point unit, but you need to adapt code to follow the different ABI.

x32/x64 example:

function PartArcTan(X: double): double;
asm
{$IFDEF CPUX64}
        movq [rsp-8], xmm0
        fld    qword ptr [rsp-8]
{$ELSE}
        fld    qword ptr X
{$ENDIF}
        fld1
        fpatan
        fwait
{$IFDEF CPUX64}
        fstp   qword ptr [rsp-8]
        movq   xmm0, [rsp-8]
{$ENDIF}
end;