I apologize if I wasn't clear enough about the purpose of this code. I just wanted to try the Streaming DMA API, so I needed to be able to simply map/unmap a kernel memory buffer (and try to access it from the CPU).
I did some further tests trying to set up the struct device
in a way that dma_map_single()
would accept... which resulted in kernel panic. The logs revealed that the panic was caused by lib/swiotlb_map_page.c (I also forgot to mention that my hardware platform is x86_64). I studied the source code and found out the following.
If the struct device
supplied to dma_map_single()
does not have its dma_mask
set, the underlying code will assume that the bus address your kernel buffer has been mapped to is 'not DMA'ble' (it calls dma_capable() which compares the highest mapped address to the mask). If the mapped address range is not DMA-capable, then an attempt is made to use a bounce buffer that might be accessible to the device, but since the mask is not set, the function concludes that the bounce buffer is not DMA'ble either and panics.
Note that dma_mask
is a pointer to u64, so in order to use a meaningful value you should have a storage for it. Note also that while dma_set_mask does set the mask value, it doesn't allocate storage for it. If dma_mask
is NULL it is equivalent to having the mask set to zero (the relevant code checks dma_mask for NULL before dereferencing the pointer).
I also noticed that x86-specific code uses a 'fallback' device structure for some requests. See arch/x86/kernel/pci-dma.c for details. Essentially, the structure has its coherent_dma_mask
set to some value, and dma_mask
is simply set to point to coherent_dma_mask
.
I modeled my device structure after this fallback structure and finally got dma_map_single()
to work. The updated code looks as follows:
static struct device dev = {
.init_name = "mydmadev",
.coherent_dma_mask = ~0, // dma_alloc_coherent(): allow any address
.dma_mask = &dev.coherent_dma_mask, // other APIs: use the same mask as coherent
};
static void map_single(void) {
char *kbuf = kmalloc(size, GFP_KERNEL | GFP_DMA);
dma_addr_t dma_addr = dma_map_single(&dev, kbuf, size, direction);
if (dma_mapping_error(&dev, dma_addr)) {
pr_info("dma_map_single() failed\n");
kfree(kbuf);
goto fail;
} else {
pr_info("dma_map_single() succeeded");
}
// the device can be told to access the buffer at dma_addr ...
// get hold of the buffer temporarily to do some reads/writes
dma_sync_single_for_cpu(&dev, dma_addr, size, direction);
// release the buffer to the device again
dma_sync_single_for_device(&dev, dma_addr, size, direction);
// some further device I/O...
// done with the buffer, unmap and free
dma_unmap_single(&dev, dma_addr, size, direction);
// check/store buffer contents...
// free the buffer
kfree(kbuf);
}
Of course, the trick with struct device
might not be portable, but did work on my x86_64 and 2.6.32/35 kernels, so others might find it useful if they want to experiment with the mapping API. Transfers are impossible without the physical device, but I was able to check the bus addresses dma_map_single() generates and access the buffer after calling dma_sync_single_for_cpu()
, so I think that it was worth investigating.
Thanks a lot for your answers. Any futher suggestions/improvements to the above code are welcome.