Yes, it's possible. One approach is to use zero-copy (i.e. GPU mapped) host memory. The host places its data in the mapped area, and the GPU communicates back in the mapped area. Obviously this required polling, but that is inherent in your question.
This answer gives you most of the plumbing you need for a simple test case.
There is also the simple zero-copy sample code.
This answer provides a more involved, fully worked example.
Naturally, you'd want to do this in an environment where there are no timeout watchdogs enabled.