After long profiling, I found out it wasn't really the layered window update that was bottleneck. Refreshing the whole screen, the SelectBitmap method above, on a 1920*1200 was taking about 6-8ms. Sure, not very amazing, but plenty enough to refresh at 30 FPS+.
In my case, the performance drains was coming from some thread asking for refresh almost a hundred time per redraw, making everything sluggish. The solution was to break down the refresh/redraw and separate them. One would update (union) a region and the other, when not drawing, would take that region, draw it and then empty it.