Assembly mask logic question
Question
This is very simple, but I haven't been able to figure it out yet.
This question is regarding a assembly mmx, but it's pure logic.
Imagine the following scenario:
MM0: 04 03 02 01 04 03 02 01 <-- input
MM1: 02 02 02 02 02 02 02 02
MM2: 04 03 02 01 04 03 02 01 <-- copy of input
after pcmpgtw MM0, MM1
MM0: FF FF 00 00 FF FF 00 00 <-- words where MM0 is greater than MM1 (comparing words)
MM1: 02 02 02 02 02 02 02 02
MM2: 04 03 02 01 04 03 02 01
after pand MM0, MM2
MM0: 04 03 00 00 04 03 00 00 <-- almost there...
MM1: 02 02 02 02 02 02 02 02
MM2: 04 03 02 01 04 03 02 01
What I want is to know fill the zeros of MM0 with 02. I suppose I would have to invert MM0 register in step2, changing the FF's to 00's and the 00's to FF's and then do a and to MM1 and finally a or to merge the two.
If I was able to get:
MM3: 00 00 FF FF 00 00 FF FF
then, pand MM2, MM3
MM1: 04 03 00 00 04 03 00 00
MM2: 00 00 02 02 00 00 02 02
finally por MM0, MM1 would give me the desired outcome:
MM0: 04 03 02 02 04 03 02 02 <-- Aha!
Summing up, how can I get that MM3 register as 00 00 FF FF 00 00 FF ? How can I invert the bits, proving I only have AND, OR, XOR and NAND instructions available in MMX registers?
Any answer is greatly appreciated. Thanks.
Solution
You can also generate the mask using pcmpgtw and swap the order of the arguments. That way you can save a register:
MM0: 04 03 02 01 04 03 02 01 <-- input
MM1: 02 02 02 02 02 02 02 02
MM2: 04 03 02 01 04 03 02 01 <-- copy of input
pcmpgtw MM0, MM1 ; MM0 = FF FF 00 00 FF FF 00 00
pcmpgtw MM1, MM2 ; MM1 = 00 00 FF FF 00 00 FF FF
You may have to make a copy of the MM1 argument because it will get destroyed during mask generation, but this is often faster than loading/generating a 64 bit constant.
A alternative way would be to use PNAND:
pcmpgtw MM0, MM1 ; MM0 = FF FF 00 00 FF FF 00 00
pand MM2, MM0 ; leave bytes with FF intact
pnand MM1, MM0 ; leave bytes with 00 intact
por MM1, MM2 ; combine the results.
OTHER TIPS
So you have a mask = 0xFFFF0000FFFF0000;
then:
all_ones = 0xFFFFFFFFFFFFFFFF;
inverted_mask = mask XOR all_ones;
merging M0 and M1 is:
M0 = M0 AND mask;
M1 = M1 AND inverted_mask;
M0 = M0 OR M1;
this edits M0 and M1 in place so their values are destroyed. If you want to preserve M1 then you need to store the intermediate result into a temporary variable/register/memory:
M0 = M0 AND mask;
TEMP = M1 AND inverted_mask;
M0 = M0 OR TEMP;