Question

For testing purposes, I am writing short assembly snippets for Intel's Xeon Phi with the Icc inline assembler. Now I wanted to use masked vector instructions, but I fail at feeding them to the inline assembler.

For code like this:

vmovapd  -64(%%r14, %%r10), %%zmm0{%%k1} 

I get the error message

/tmp/icpc5115IWas_.s: Assembler messages:
/tmp/icpc5115IWas_.s:563: Error: junk `%k1' after register

I tried a lot of different combinations, but nothing worked. The compiler version is intel64/13.1up03 under Linux, using GAS syntax.

Edit: The code above actually works with non-extended assembler. So this:

__asm__("vmovapd  -64(%r14, %r10), %zmm0{%k1} ")

works, while the following does not:

__asm__("vmovapd  -64(%[src], %%r10), %%zmm0{%%k1} "
    :
    : [src]"r"(src)
    :)

I guess it has something to do with the necessity to use a double % before register names in extended mode. But no, a single % for the k does not work either.

Was it helpful?

Solution

I asked the same question in the Intel Developer zone http://software.intel.com/en-us/forums/topic/499145#comment-1776563, the answer is, that in order to use the mask registers on the Xeon Phi in extended inline assembler, you have to use double curly braces around the mask register modifier.

vmovapd     %%zmm30,         (%%r15,    %%r10){{%%k1}}

OTHER TIPS

I think you need to use the masked variant of the instruction: VMASKMOVPD

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top