I suppose since you only ever have a bit corresponding to a single "stream", then you can shift all of them at the same time and see "what stream this belongs to". This will allow for a modest amount of parallel implementation - although it will depend a bit on the architecture and the size of the masks how efficient this will be. It will also depend on how often you want to do this - a certain amount of pre-computation would make subsequent runs more efficient. It does sound a bit brute-force; may not be better than what you already did.
There is, of course, the lovely question Extracting bits with a single multiplication (to which I wrote the accepted answer…) that suggests a way to extract certain bits from a larger number with a single multiplication. This method does suffer from the disadvantage that you cannot do this for any number of set bits - there needs to be a number of spaces in between.
This can be resolved, in principle, by repeating the process two or more times, and applying additional masks in between. Let's see how this would work for just the first mask in your problem above.
mask1 = 11001000
num = abcdefgh
temp = num & mask1 = ab..e...
magic = 4 + 1
answer = ab..e...
ab..e..... +
------------
ababe.e...
mask2 = 0011100000
ans & mask = abe.....
This puts the three digits you want into the top three spots with mask-multiply-mask (3 operations). Not terribly efficient for an eight bit number; but you can expand this to a 32 bit number and then it starts to look more interesting.