gcc options to use i87, AVX simultaneously but nor SSE

Question 1

You normally don't want this; I don't think gcc is smart enough at deciding to use x87 when there's high register pressure to make it worth it.

x87 and SSE/AVX instructions compete for the same FP execution units on normal x86 CPUs (Intel and AMD), so you don't get more throughput from interleaving them.

Normally you should just use -mfpmath=sse (which means AVX when used with -mavx, or better -mfpmath=sse -march=native. The default for x86-64 is sse, so -mfpmath=sse only changes anything for -m32, AFAIK.

The main benefit of -mfpmath=both is more total registers, but managing the x87 register stack often costs extra instructions. Moving data between x87 and AVX also costs a store/reload (store-forwarding round trip, ~6 cycle latency on Haswell, http://agner.org/optimize/), so it's only really useful if you have two independent sets of calculations for the compiler to interleave. Otherwise it's not better than normal spill/reload.

Last time I looked at gcc -O3 -mfpmath=both, the results were not impressive: https://godbolt.org/g/p2KLEC shows gcc5.4 using some store/reloads to bounce data between x87 and xmm (AVX) registers. It would be better off just keeping some of the constants in memory.

Question 2

The AVX registers are only extensions of the SSE registers. You cannot mix SSE and AVX instructions to increase the number of available registers (you can still mix x87 and AVX instructions, I assume this is what -mfpmath=both does in this case).

See for instance the discussion “Mixing AVX and SSE” on this page.