I have implemented this as creating a hidden window at coordinates (-32000, -32000), which is targeted as the primary screen output in DirectX IDXGIOutput1::DuplicateOutput()
.
After being created, this hidden window is mirrored to the required windows with using DWM as shown in my other answer:
hr = DwmRegisterThumbnail();
hr = DwmUpdateThumbnailProperties();
The resulting performance is sufficient even for a few big (1920x1200) windows. The CPU load is no higher than 5%.