DMA operates on physical memory and memory-mapped devices, whereas most user space applications operate on virtual memory. This presents two problems:
- Mapping virtual memory to physical memory: you'll have to make sure it's cache flushed, paged in, locked so it can't be paged out again, and it's unlikely to be contiguous - it'll be scattered all over physical memory in small chunks
- Security of physical memory access: this gives user space applications access to the whole of physical memory, great for sniffing passwords and crypto keys
On a modern processor, memcpy() is likely to be pretty efficient in both software and hardware until you're copying 10s of megabytes at a time, after which you need to think carefully about cacheing and virtual memory anyway. In short, I wouldn't bother.