Question

What is the fastest way you know to convert a floating-point number to an int on an x86 CPU. Preferrably in C or assembly (that can be in-lined in C) for any combination of the following:

  • 32/64/80-bit float -> 32/64-bit integer

I'm looking for some technique that is faster than to just let the compiler do it.

Was it helpful?

Solution

It depends on if you want a truncating conversion or a rounding one and at what precision. By default, C will perform a truncating conversion when you go from float to int. There are FPU instructions that do it but it's not an ANSI C conversion and there are significant caveats to using it (such as knowing the FPU rounding state). Since the answer to your problem is quite complex and depends on some variables you haven't expressed, I recommend this article on the issue:

http://www.stereopsis.com/FPU.html

OTHER TIPS

Packed conversion using SSE is by far the fastest method, since you can convert multiple values in the same instruction. ffmpeg has a lot of assembly for this (mostly for converting the decoded output of audio to integer samples); check it for some examples.

A commonly used trick for plain x86/x87 code is to force the mantissa part of the float to represent the int. 32 bit version follows.

The 64-bit version is analogical. The Lua version posted above is faster, but relies on the truncation of double to a 32-bit result, therefore it requires the x87 unit to be set to double precision, and cannot be adapted for double to 64-bit int conversion.

The nice thing about this code is it is completely portable for all platforms conforming to IEEE 754, the only assumption made is the floating point rounding mode is set to nearest. Note: Portable in the sense it compiles and works. Platforms other than x86 usually do not benefit much from this technique, if at all.

static const float Snapper=3<<22;

union UFloatInt {
 int i;
 float f;
};

/** by Vlad Kaipetsky
portable assuming FP24 set to nearest rounding mode
efficient on x86 platform
*/
inline int toInt( float fval )
{
  Assert( fabs(fval)<=0x003fffff ); // only 23 bit values handled
  UFloatInt &fi = *(UFloatInt *)&fval;
  fi.f += Snapper;
  return ( (fi.i)&0x007fffff ) - 0x00400000;
}

If you can guarantee the CPU running your code is SSE3 compatible (even Pentium 5 is, JBB), you can allow the compiler to use its FISTTP instruction (i.e. -msse3 for gcc). It seems to do the thing like it should always have been done:

http://software.intel.com/en-us/articles/how-to-implement-the-fisttp-streaming-simd-extensions-3-instruction/

Note that FISTTP is different from FISTP (that has its problems, causing the slowness). It comes as part of SSE3 but is actually (the only) X87-side refinement.

Other then X86 CPU's would probably do the conversion just fine, anyways. :)

Processors with SSE3 support

There is one instruction to convert a floating point to an int in assembly: use the FISTP instruction. It pops the value off the floating-point stack, converts it to an integer, and then stores at at the address specified. I don't think there would be a faster way (unless you use extended instruction sets like MMX or SSE, which I am not familiar with).

Another instruction, FIST, leaves the value on the FP stack but I'm not sure it works with quad-word sized destinations.

The Lua code base has the following snippet to do this (check in src/luaconf.h from www.lua.org). If you find (SO finds) a faster way, I'm sure they'd be thrilled.

Oh, lua_Number means double. :)

/*
@@ lua_number2int is a macro to convert lua_Number to int.
@@ lua_number2integer is a macro to convert lua_Number to lua_Integer.
** CHANGE them if you know a faster way to convert a lua_Number to
** int (with any rounding method and without throwing errors) in your
** system. In Pentium machines, a naive typecast from double to int
** in C is extremely slow, so any alternative is worth trying.
*/

/* On a Pentium, resort to a trick */
#if defined(LUA_NUMBER_DOUBLE) && !defined(LUA_ANSI) && !defined(__SSE2__) && \
    (defined(__i386) || defined (_M_IX86) || defined(__i386__))

/* On a Microsoft compiler, use assembler */
#if defined(_MSC_VER)

#define lua_number2int(i,d)   __asm fld d   __asm fistp i
#define lua_number2integer(i,n)     lua_number2int(i, n)

/* the next trick should work on any Pentium, but sometimes clashes
   with a DirectX idiosyncrasy */
#else

union luai_Cast { double l_d; long l_l; };
#define lua_number2int(i,d) \
  { volatile union luai_Cast u; u.l_d = (d) + 6755399441055744.0; (i) = u.l_l; }
#define lua_number2integer(i,n)     lua_number2int(i, n)

#endif

/* this option always works, but may be slow */
#else
#define lua_number2int(i,d) ((i)=(int)(d))
#define lua_number2integer(i,d) ((i)=(lua_Integer)(d))

#endif

I assume truncation is required, same as if one writes i = (int)f in "C".

If you have SSE3, you can use:

int convert(float x)
{
    int n;
    __asm {
        fld x
        fisttp n // the extra 't' means truncate
    }
    return n;
}

Alternately, with SSE2 (or in x64 where inline assembly might not be available), you can use almost as fast:

#include <xmmintrin.h>
int convert(float x)
{
    return _mm_cvtt_ss2si(_mm_load_ss(&x)); // extra 't' means truncate
}

On older computers there is an option to set the rounding mode manually and perform conversion using the ordinary fistp instruction. That will probably only work for arrays of floats, otherwise care must be taken to not use any constructs that would make the compiler change rounding mode (such as casting). It is done like this:

void Set_Trunc()
{
    // cw is a 16-bit register [_ _ _ ic rc1 rc0 pc1 pc0 iem _ pm um om zm dm im]
    __asm {
        push ax // use stack to store the control word
        fnstcw word ptr [esp]
        fwait // needed to make sure the control word is there
        mov ax, word ptr [esp] // or pop ax ...
        or ax, 0xc00 // set both rc bits (alternately "or ah, 0xc")
        mov word ptr [esp], ax // ... and push ax
        fldcw word ptr [esp]
        pop ax
    }
}

void convertArray(int *dest, const float *src, int n)
{
    Set_Trunc();
    __asm {
        mov eax, src
        mov edx, dest
        mov ecx, n // load loop variables

        cmp ecx, 0
        je bottom // handle zero-length arrays

    top:
        fld dword ptr [eax]
        fistp dword ptr [edx]
        loop top // decrement ecx, jump to top
    bottom:
    }
}

Note that the inline assembly only works with Microsoft's Visual Studio compilers (and maybe Borland), it would have to be rewritten to GNU assembly in order to compile with gcc. The SSE2 solution with intrinsics should be quite portable, however.

Other rounding modes are possible by different SSE2 intrinsics or by manually setting the FPU control word to a different rounding mode.

If you really care about the speed of this make sure your compiler is generating the FIST instruction. In MSVC you can do this with /QIfist, see this MSDN overview

You can also consider using SSE intrinsics to do the work for you, see this article from Intel: http://softwarecommunity.intel.com/articles/eng/2076.htm

Since MS scews us out of inline assembly in X64 and forces us to use intrinsics, I looked up which to use. MSDN doc gives _mm_cvtsd_si64x with an example.

The example works, but is horribly inefficient, using an unaligned load of 2 doubles, where we need just a single load, so getting rid of the additional alignment requirement. Then a lot of needless loads and reloads are produced, but they can be eliminated as follows:

 #include <intrin.h>
 #pragma intrinsic(_mm_cvtsd_si64x)
 long long _inline double2int(const double &d)
 {
     return _mm_cvtsd_si64x(*(__m128d*)&d);
 }

Result:

        i=double2int(d);
000000013F651085  cvtsd2si    rax,mmword ptr [rsp+38h]  
000000013F65108C  mov         qword ptr [rsp+28h],rax  

The rounding mode can be set without inline assembly, e.g.

    _control87(_RC_NEAR,_MCW_RC);

where rounding to nearest is default (anyway).

The question whether to set the rounding mode at each call or to assume it will be restored (third party libs) will have to be answered by experience, I guess. You will have to include float.h for _control87() and related constants.

And, no, this will not work in 32 bits, so keep using the FISTP instruction:

_asm fld d
_asm fistp i

Generally, you can trust the compiler to be efficient and correct. There is usually nothing to be gained by rolling your own functions for something that already exists in the compiler.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top