Is there any data on AVX2 gather latency?

(for instance a _mm256_i32gather_ps instruction accessing a single cache line)

有帮助吗?

解决方案

This page gives latency data for all intrinsics:

Intel Intrinsics Guide

The latency for _mm256_i32gather_ps is 6.

其他提示

Actually, this really depends on the hardware. If you look at Agner Fog's instruction tables, you'll see that there are no latencies listed for Zen1 and Zen2, but have reciprocal throughputs of 13-20 and 9-16 for VGATHERDPS. For Intel processors we have:

                     xmm                 ymm
Processor    throughput latency  throughput latency
-------------------------------------------------------
Haswell          9                    12
Broadwell        6                     7
Skylake          4         12          5       13
SkylakeX         4         12          5       13
Coffee Lake      4         12          5       13

Also, Intel's site no longer lists the throughput/latencies of of the gather instructions for AVX2, but there are some for AVX512.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top