This are my research results from around 03/14 with no definitive answer to this problem. I did not try the mentioned possibility in Media Foundation, since it sounded as if the result has no transparency.
I was able to use a second gray scale video to mask the rgb video inside a shader. This can be done with a separate video stream, but syncing is needed. Moreover it is possible to encode a video with two frames side by side, but many HW accelerated video codecs do not allow this, WMF being the exception. Performance is not great but I was able to play 3 1080p30 videos simultaneously.
On a side note, to my surprise Flash was able to play 5+ 1080p30 videos with transparency simultaneously. The flash video codec allows alpha values, but I managed only inside flash to use them.