Technically you can access bytes using PEXTRB
but that's not recommended for this task. I would do a SIMD compare using PCMPEQB
then PMOVMSKB
to get the result mask, then look up the first set bit using BSF
then create a blend mask from that. Avoid looping, use parallelism.
Update: based on rwong's comment here is a possible implementation using pcmpistrm
:
3 movdqu input, %xmm1
(gdb) si
4 movdqu replace, %xmm2
(gdb)
5 movdqa %xmm1, %xmm0
(gdb)
6 pcmpistrm $0x78, %xmm1, %xmm1
(gdb) p/s $xmm1.v16_int8
$1 = "input\000----------"
(gdb) p/s $xmm2.v16_int8
$2 = "replacereplacere"
(gdb) si
7 pblendvb %xmm1, %xmm2
(gdb) si
8 ret
(gdb) p/s $xmm2.v16_int8
$3 = "repla\000----------"