Yes, you can use CUDA and MPI independently (i.e. without GPUDirect), just as you describe.
- Move the data from device to host
- Transfer the data as you ordinarily would, using MPI
You might be interested in this presentation, which explains CUDA-aware MPI, and gives an example side-by-side on slide 11 of non-cuda MPI and CUDA-MPI