Question

For a uni assignment, I need to write a function which counts the number of spaces in a string (defined by a pointer and an index) in assembly. There's a requirement to use pcmpeqb for this (that is, work with SSE registers), and a hint to use popcnt and pmovmskb. My basic approach is to process the string in 16-byte chunks, loading each chunk into %xmm8 and comparing it with %xmm9 which is initialised to contain 16 spaces. However, I need to handle the last chunk specially somehow.

My first thought was to use a rotate instruction to delete the garbage past the end of the string. (The string is guaranteed to have some extra space allocated after the end to prevent segfaults, but the data there probably shouldn't be used for the comparison.) I stumbled upon PSRLDQ but it doesn't seem to accept a non-immediate argument. (Or at the very least refused what I threw at it.) So my question is: how can I remove the last X bytes of a SSE register, without zeroing half of it, or doing so word-by-word? (As I understand most of the available operations on them do.)

My code (modulo boilerplate) currently looks like this - the problematic bit is towards the end, after the label _last::

    # === Arguments ===
    # %rdi - char *input
    # %rsi - size_t count
    # === Temporaries ===
    # %rdx - how many chars to process in final run
    # %rcx - how many characters were "read" already
    # %r8 - pop count of last iteration
    # %r9
    # %r11
    # === SSE Temporaries ===
    # %xmm8 - the chunk of the string being processed
    # %xmm9 - 16 spaces

    xor %rcx, %rcx
    xor %rax, %rax
    movdqu _spaces(%rip), %xmm9

_loop:
    # set %rdx to number of characters left to process
    mov %rsi, %rdx
    sub %rcx, %rdx

    # we've reached the end of the string
    cmp %rdx, %rsi
    jge _end

    movdqu (%rdi, %rcx), %xmm8 # load chunk of string to process
    add $16, %rcx

    # less than 16 characters to process
    cmp $16, %rdx
    jg _last

_compare: #compare %xmm8 with spaces and add count of spaces to %eax
    pcmpeqb %xmm9, %xmm8
    pmovmskb %xmm8, %r8d
    popcntl %r8d, %r8d
    add %r8d, %eax
    jmp _loop

_last: # last part of string, less than 16 chars
    sub $16, %rdx
    neg %rdx
    # I need to delete possible garbage after the last chars
    psrldq %edx, %xmm8 
    jmp _compare

_end:
    ret

(The control flow there might still be buggy, but I'll deal with that later.)

Was it helpful?

Solution 2

I think the simplest solution would be to use all 16 characters in the SSE register, but mask off the bits after the pmovmskb. Note that using 16 byte loads like you do is unsafe because you may cross over into an inaccessible page.

OTHER TIPS

Don't bother trying to "delete" the extra bytes in the SSE register. Instead, after you compare and do PMOVMSKB, simply mask off the bits in the resulting mask that correspond to the extra bytes. This is a very standard approach in vectorization; instead of jumping through hoops to get just the data you want, process everything, and then clean up the bits you didn't want later.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top