Question

Is there any data on AVX2 gather latency?

(for instance a _mm256_i32gather_ps instruction accessing a single cache line)

Was it helpful?

Solution

This page gives latency data for all intrinsics:

Intel Intrinsics Guide

The latency for _mm256_i32gather_ps is 6.

OTHER TIPS

Actually, this really depends on the hardware. If you look at Agner Fog's instruction tables, you'll see that there are no latencies listed for Zen1 and Zen2, but have reciprocal throughputs of 13-20 and 9-16 for VGATHERDPS. For Intel processors we have:

                     xmm                 ymm
Processor    throughput latency  throughput latency
-------------------------------------------------------
Haswell          9                    12
Broadwell        6                     7
Skylake          4         12          5       13
SkylakeX         4         12          5       13
Coffee Lake      4         12          5       13

Also, Intel's site no longer lists the throughput/latencies of of the gather instructions for AVX2, but there are some for AVX512.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top