I think the simplest solution would be to use all 16 characters in the SSE register, but mask off the bits after the pmovmskb
. Note that using 16 byte loads like you do is unsafe because you may cross over into an inaccessible page.
How to "remove" bytes at the end of a SSE register?
Question
For a uni assignment, I need to write a function which counts the number of spaces in a string (defined by a pointer and an index) in assembly. There's a requirement to use pcmpeqb
for this (that is, work with SSE registers), and a hint to use popcnt
and pmovmskb
. My basic approach is to process the string in 16-byte chunks, loading each chunk into %xmm8
and comparing it with %xmm9
which is initialised to contain 16 spaces. However, I need to handle the last chunk specially somehow.
My first thought was to use a rotate instruction to delete the garbage past the end of the string. (The string is guaranteed to have some extra space allocated after the end to prevent segfaults, but the data there probably shouldn't be used for the comparison.) I stumbled upon PSRLDQ
but it doesn't seem to accept a non-immediate argument. (Or at the very least refused what I threw at it.) So my question is: how can I remove the last X bytes of a SSE register, without zeroing half of it, or doing so word-by-word? (As I understand most of the available operations on them do.)
My code (modulo boilerplate) currently looks like this - the problematic bit is towards the end, after the label _last:
:
# === Arguments ===
# %rdi - char *input
# %rsi - size_t count
# === Temporaries ===
# %rdx - how many chars to process in final run
# %rcx - how many characters were "read" already
# %r8 - pop count of last iteration
# %r9
# %r11
# === SSE Temporaries ===
# %xmm8 - the chunk of the string being processed
# %xmm9 - 16 spaces
xor %rcx, %rcx
xor %rax, %rax
movdqu _spaces(%rip), %xmm9
_loop:
# set %rdx to number of characters left to process
mov %rsi, %rdx
sub %rcx, %rdx
# we've reached the end of the string
cmp %rdx, %rsi
jge _end
movdqu (%rdi, %rcx), %xmm8 # load chunk of string to process
add $16, %rcx
# less than 16 characters to process
cmp $16, %rdx
jg _last
_compare: #compare %xmm8 with spaces and add count of spaces to %eax
pcmpeqb %xmm9, %xmm8
pmovmskb %xmm8, %r8d
popcntl %r8d, %r8d
add %r8d, %eax
jmp _loop
_last: # last part of string, less than 16 chars
sub $16, %rdx
neg %rdx
# I need to delete possible garbage after the last chars
psrldq %edx, %xmm8
jmp _compare
_end:
ret
(The control flow there might still be buggy, but I'll deal with that later.)
Solution 2
OTHER TIPS
Don't bother trying to "delete" the extra bytes in the SSE register. Instead, after you compare and do PMOVMSKB, simply mask off the bits in the resulting mask that correspond to the extra bytes. This is a very standard approach in vectorization; instead of jumping through hoops to get just the data you want, process everything, and then clean up the bits you didn't want later.