gcc optimization flags for Xeon?

https://stackoverflow.com/questions/943755

09-09-2019
|

Question

I'd want your input which gcc compiler flags to use when optimizing for Xeons?

There's no 'xeon' in mtune or march so which is the closest match?

Solution

Xeon is a marketing term, as such it covers a long list of processors with very different internals.

If you meant the newer Nehalem processors (Core i7) then this slide indicates that as of 4.3.1 gcc should be use -march=generic (though your own testing of your own app may find other settings that outperform this). The 4.3 series also added -msse4.2 if you wish to optimize that aspect of FP maths.

Here is some discussion comparing tuning in Intel's compiler versus some gcc flags.

OTHER TIPS

An update for recent GCC / Xeon.

Sandy-Bridge-based Xeon (E3-12xx series, E5-14xx/24xx series, E5-16xx/26xx/46xx series).

-march=corei7-avx for GCC < 4.9.0 or -march=sandybridge for GCC >= 4.9.0.

This enables the Advanced Vector Extensions support as well as the AES and PCLMUL instruction sets for Sandy Bridge. Here's the overview from the GCC i386/x86_64 options page:

Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support.
Ivy-Bridge-based Xeon (E3-12xx v2-series, E5-14xx v2/24xx v2-series, E5-16xx v2/26xx v2/46xx v2-series, E7-28xx v2/48xx v2/88xx v2-series).

-march=core-avx-i for GCC < 4.9.0 or -march=ivybridge for GCC >= 4.9.0.

This includes the Sandy Bridge (corei7-avx) options while also tacking in support for the new Ivy instruction sets: FSGSBASE, RDRND and F16C. From GCC options page:

Intel Core CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C6 instruction set support.
Haswell-based Xeon (E3-1xxx v3-series, E5-1xxx v3-series, E5-2xxx v3-series).

-march=core-avx2 for GCC 4.8.2/4.8.3 or -march=haswell for GCC >= 4.9.0.

From GCC options page:

Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2 and F16C instruction set support.
Broadwell-based Xeon (E3-12xx v4 series, E5-16xx v4 series)

-march=core-avx2 for GCC 4.8.x or -march=broadwell for GCC >= 4.9.0.

From GCC options page:

Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support.
Skylake-based Xeon (E3-12xx v5 series) and KabyLake-based Xeon (E3-12xx v6 series):

-march=core-avx2 for GCC 4.8.x or -march=skylake for GCC 4.9.x or -march=skylake-avx512 for GCC >= 5.x

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions.

From GCC options page:

Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set support.
- Coffee Lake-based Xeon (E-21xx): -march=skylake-avx512.

To find out what the compiler will do with the -march=native option you can use:

gcc -march=native -Q --help=target

newer versions of gcc have -march=native which lets the compiler automatically determine the optimal -march flag.

march=native is okay for your own machine but bad for binary releases.

-march=nocona is suggested for atom 330 (p4/64bit) -march=core2 is for core2

I'm assuming you're going 64bit.

The following will show you all the flags your processor supports:

cat /proc/cpuinfo | grep flags | head -1

Best way to determine what optimizations exist for your proccesor specifically depends not only on the model, but what version of gcc you have on the system you are compiling. Make sure to check what version of gcc you have, and cross reference on their documentation:

https://gcc.gnu.org/onlinedocs

i.e. I have Slackware 14.1 x64, which has gcc 4.8.2, so I would go here:

https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options

My experience with Intel CPUs and x86_64 has been that every time I tried to tell gcc to optimize for a specific CPU type, the performance got worse than with -march=generic, not better. YMMV, of course, but I've been playing around with stuff like this lots of times over the years, and it has always been like that.

OTOH, on i386 it might make sense to target at least i686 or, if you want SSE math, at least Pentium 4.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow