質問

Is there any data on AVX2 gather latency?

(for instance a _mm256_i32gather_ps instruction accessing a single cache line)

役に立ちましたか?

解決

This page gives latency data for all intrinsics:

Intel Intrinsics Guide

The latency for _mm256_i32gather_ps is 6.

他のヒント

Actually, this really depends on the hardware. If you look at Agner Fog's instruction tables, you'll see that there are no latencies listed for Zen1 and Zen2, but have reciprocal throughputs of 13-20 and 9-16 for VGATHERDPS. For Intel processors we have:

                     xmm                 ymm
Processor    throughput latency  throughput latency
-------------------------------------------------------
Haswell          9                    12
Broadwell        6                     7
Skylake          4         12          5       13
SkylakeX         4         12          5       13
Coffee Lake      4         12          5       13

Also, Intel's site no longer lists the throughput/latencies of of the gather instructions for AVX2, but there are some for AVX512.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top