Question

In AVX there are two instructions to do a bitwise-or VORPD and VORPS. The docs say:

VORPD (VEX.256 encoded version)
DEST[63:0] <- SRC1[63:0] BITWISE OR SRC2[63:0]
DEST[127:64] <- SRC1[127:64] BITWISE OR SRC2[127:64]
DEST[191:128] <- SRC1[191:128] BITWISE OR SRC2[191:128]
DEST[255:192] <- SRC1[255:192] BITWISE OR SRC2[255:192]

and

VORPS (VEX.256 encoded version)
DEST[31:0] <- SRC1[31:0] BITWISE OR SRC2[31:0]
DEST[63:32] <- SRC1[63:32] BITWISE OR SRC2[63:32]
DEST[95:64] <- SRC1[95:64] BITWISE OR SRC2[95:64]
DEST[127:96] <- SRC1[127:96] BITWISE OR SRC2[127:96]
DEST[159:128] <- SRC1[159:128] BITWISE OR SRC2[159:128]
DEST[191:160] <- SRC1[191:160] BITWISE OR SRC2[191:160]
DEST[223:192] <- SRC1[223:192] BITWISE OR SRC2[223:192]
DEST[255:224] <- SRC1[255:224] BITWISE OR SRC2[255:224]

Is there any actual difference between these two processor operations? If not: Why are there two instructions? Also if not: Is it safe to use them to do integer bitwise-or?

Was it helpful?

Solution

The presence of PS and PD varieties of all (or nearly all) SEE/AVX instructions has a historical context: Once upon a time, when Intel originally designed the first SSE instruction set, they thought that future chip architectures would have three domains: Integer, Single-Precision Floating Point (32-bit), Double-Precision Floating Point (64-bit)

Note: domains are segregated logic units within the CPU, and they matter because there's a small delay in transferring SSE/AVX register contents between them. Hence if a result from an instruction in the integer domain is used as an input to an instruction in a floating point domain, a 1 or 2 cycle delay may occur.

For this reason, Intel mirrored most logical bitwise and shuffle instructions three times: One for integers, one for SP-FP, and one for DP-FP. The operations performed by these mirrored instructions is identical -- including between integer and floating-point varieties.

At present time most x86 architectures have two domains: Integer and Floating Point. The FP domain handles both Single and Double-Precision (32/64 bit). Some architectures only have one domain for all SSE/AVX instructions. It is plausible that a third domain for double-precision could be added to some future architectures.

OTHER TIPS

There is no difference in the result of the operation. There are two types for logical consistency because there are two data types single packed (float32) and double packed (float64).

In the case of integers it does not matter what operation you use just be consistent with the data type. If you package int with max 32bit width use single packed if bigger use double packed. Just imagine it is a cast you can promote 32bit int to 64bit int without loss but the other way round if a route to disaster.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top