Question

Intel® 64 and IA-32 Architectures Optimization Reference Manual lists latency and throughput figures for various CPU instructions.

For transcendental functions (FSIN etc) some of the figures are listed as ranges (page C-29). Footnote 4 explains:

Latency and Throughput of transcendental instructions can vary substantially in a dynamic execution environment. Only an approximate value or a range of values are given for these instructions.

My question is: what factors affect the throughput and latency of such instructions? I imagine the value of the argument is one factor. Are there any other?

Was it helpful?

Solution

Besides the argument, the mix of other instructions that are in flight may have an effect on the latency and throughput. These instructions are microcoded, which means they generate a sequence of µops which need to contend with other instructions for ALU resources; in case of such contention, performance may be adversely effected.

OTHER TIPS

The x87 control word specifies the accuracy of computations (64-bit, 53-bit, or 24-bit mantissa), and it can affect the performance of transcendental functions, especially those of them which internally use division or square root. In general, I advise to avoid using trigonometric x87 instructions because by design they are very inaccurate for large input values.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top