VMR-7, VMR-9, EVR - all renderers "insist" on their own memory allocator, backed by memory of video surfaces. These allocators have specific requirement. You cannot change this behavior.
InfTee on the other hand "insists" on its own output pin memory allocator, and it shares it between output pins, so that no data copy takes place when tee'ing the feed (this is what you refer to as "dumb nature").
You cannot get it working all together, you need an additional filter in between that copies data from tee to video renderer's memory. Ideal in terms of performance is a custom transformation filter which copies data taking extended strides into consideration. Without it, you have a closest stock/registered filter to serve the same purpose.