I have found a (disappointing) answer to my own question:
I found this quote from someone on the CUDA development team [link]
"I am not a Thrust expert, so take this feedback with a grain of salt; but I think this design element of Thrust deserves to be revisited. Thrust is expressive and useful in ways that sometimes are undermined by the emphasis on returning results to the host. I've had plenty of occasions where I wanted to do an operation strictly in device memory, so Thrust's predisposition toward returning a value to host memory actually got in the way; and if I want results returned to the host, I can always pass in a mapped device pointer (which, if UVA is in effect, means any host pointer that was allocated by CUDA)"
..so it looks like I may be out of luck. If so, what a design flaw in thrust!