Question

Consider I have a program to do AES operations.

Some advanced CPUs have AES-NI instruction set, and other CPUs don't have.

Must I compile my program into two executables: A_with_aes_ni.exe and B_without_aes_ni.exe ?

Was it helpful?

Solution

What you want is called a CPU dispatcher. Agner Fog has 10 pages of text on this in chapter three "Making critical code in multiple versions for different instruction sets" of his Optimizing C++ manual . He discusses doing this both with GCC and ICC.

You only need one executable but you need to compile two different object files with and without AES enabled. Then the dispatcher determines what instruction set is available and chooses the code path based on that.

I tried to do this with MSVC2010 cpu dispatcher for visual studio for AVX and SSE but did not succeed. I suspect I could get it working now though.

Edit: In Agner Fog's vectorclass he has a file dispatch_example.cpp and instrset_detech.cpp which should have most of what you need to make a dispatcher. You still need to figure out how to detect if a CPU has AES. You need to augment the intrset_detect.cpp file. According to wikipedia when you read CPUID bit 23 in register ECX is set if the CPU has AES. Wikipedia also has code examples to read CPUID (besides instrset_detech.cpp - another good example is at https://github.com/Mysticial/Flops in the file cpuid.c)

OTHER TIPS

One way we do this in Solaris is to have hardware capabilities libraries, which are dynamically loaded at runtime by the linker.

Another option is to firstly load a trap handler for illegal instructions, then test for your desired machine language instructions. If you hit the trap, then you know that you can't use the optimised version and have to load the non-optimised (or lesser-optimised).

While I like Andrew's suggestion above, I think it's safer to test for the specific instructions that you need. That way you don't have to keep updating your app for newer CPUID output.

Edited to add: I realise I should have provided an example. For Solaris' libc on the x64 platform, we provide hw-optimised versions of the library - three are for 32bit, one for 64bit. We can see the differences by running elfdump -H on the file of interest:

s11u1:jmcp $ elfdump -H /usr/lib/libc/libc_hwcap1.so.1 

Capabilities Section:  .SUNW_cap

 Object Capabilities:
     index  tag               value
       [0]  CA_SUNW_HW_1     0x86d  [ SSE MMX CMOV SEP CX8 FPU ]

 Symbol Capabilities:
     index  tag               value
       [2]  CA_SUNW_ID       hrt
       [3]  CA_SUNW_HW_1     0x40002  [ TSCP TSC ]

  Symbols:
     index    value      size      type bind oth ver shndx          name
       [1]  0x000f306c 0x00000225  FUNC LOCL  D    0 .text          gettimeofday%hrt
       [2]  0x000f2efc 0x00000165  FUNC LOCL  D    0 .text          gethrtime%hrt

Capabilities Chain Section:  .SUNW_capchain

 Capabilities family: gettimeofday
  chainndx  symndx      name
         1  [702]       gettimeofday
         2  [1]         gettimeofday%hrt

 Capabilities family: gethrtime
  chainndx  symndx      name
         4  [1939]      gethrtime
         5  [2]         gethrtime%hrt

s11u1:jmcp $ elfdump -H /usr/lib/libc/libc_hwcap2.so.1 

Capabilities Section:  .SUNW_cap

 Object Capabilities:
     index  tag               value
       [0]  CA_SUNW_HW_1     0x1875  [ SSE2 SSE MMX CMOV AMD_SYSC CX8 FPU ]

 Symbol Capabilities:
     index  tag               value
       [2]  CA_SUNW_ID       hrt
       [3]  CA_SUNW_HW_1     0x40002  [ TSCP TSC ]

  Symbols:
     index    value      size      type bind oth ver shndx          name
       [1]  0x000f253c 0x00000225  FUNC LOCL  D    0 .text              gettimeofday%hrt
       [2]  0x000f23cc 0x00000165  FUNC LOCL  D    0 .text          gethrtime%hrt

Capabilities Chain Section:  .SUNW_capchain

 Capabilities family: gettimeofday
  chainndx  symndx      name
         1  [702]       gettimeofday
         2  [1]         gettimeofday%hrt

 Capabilities family: gethrtime
  chainndx  symndx      name
         4  [1939]      gethrtime
         5  [2]         gethrtime%hrt

Guess which of the above is for AMD systems, and which for Intel?

The Solaris linker has smarts to load the correct hwcap library at runtime before your process' _init() is called.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top