Besides the argument, the mix of other instructions that are in flight may have an effect on the latency and throughput. These instructions are microcoded, which means they generate a sequence of µops which need to contend with other instructions for ALU resources; in case of such contention, performance may be adversely effected.
x86: latency and throughput of transcendental functions
-
17-01-2022 - |
Question
Intel® 64 and IA-32 Architectures Optimization Reference Manual lists latency and throughput figures for various CPU instructions.
For transcendental functions (FSIN
etc) some of the figures are listed as ranges (page C-29). Footnote 4 explains:
Latency and Throughput of transcendental instructions can vary substantially in a dynamic execution environment. Only an approximate value or a range of values are given for these instructions.
My question is: what factors affect the throughput and latency of such instructions? I imagine the value of the argument is one factor. Are there any other?
Solution
OTHER TIPS
The x87 control word specifies the accuracy of computations (64-bit, 53-bit, or 24-bit mantissa), and it can affect the performance of transcendental functions, especially those of them which internally use division or square root. In general, I advise to avoid using trigonometric x87 instructions because by design they are very inaccurate for large input values.