dma_map_single(): minimum requirements to struct device

Question 1

I apologize if I wasn't clear enough about the purpose of this code. I just wanted to try the Streaming DMA API, so I needed to be able to simply map/unmap a kernel memory buffer (and try to access it from the CPU).

I did some further tests trying to set up the struct device in a way that dma_map_single() would accept... which resulted in kernel panic. The logs revealed that the panic was caused by lib/swiotlb_map_page.c (I also forgot to mention that my hardware platform is x86_64). I studied the source code and found out the following.

If the struct device supplied to dma_map_single() does not have its dma_mask set, the underlying code will assume that the bus address your kernel buffer has been mapped to is 'not DMA'ble' (it calls dma_capable() which compares the highest mapped address to the mask). If the mapped address range is not DMA-capable, then an attempt is made to use a bounce buffer that might be accessible to the device, but since the mask is not set, the function concludes that the bounce buffer is not DMA'ble either and panics.

Note that dma_mask is a pointer to u64, so in order to use a meaningful value you should have a storage for it. Note also that while dma_set_mask does set the mask value, it doesn't allocate storage for it. If dma_mask is NULL it is equivalent to having the mask set to zero (the relevant code checks dma_mask for NULL before dereferencing the pointer).

I also noticed that x86-specific code uses a 'fallback' device structure for some requests. See arch/x86/kernel/pci-dma.c for details. Essentially, the structure has its coherent_dma_mask set to some value, and dma_mask is simply set to point to coherent_dma_mask.

I modeled my device structure after this fallback structure and finally got dma_map_single() to work. The updated code looks as follows:

    static struct device dev = {
        .init_name = "mydmadev",
        .coherent_dma_mask = ~0,             // dma_alloc_coherent(): allow any address
        .dma_mask = &dev.coherent_dma_mask,  // other APIs: use the same mask as coherent
    };

    static void map_single(void) {
        char *kbuf = kmalloc(size, GFP_KERNEL | GFP_DMA);
        dma_addr_t dma_addr = dma_map_single(&dev, kbuf, size, direction);

        if (dma_mapping_error(&dev, dma_addr)) {
           pr_info("dma_map_single() failed\n");
           kfree(kbuf);
           goto fail;
        } else {
            pr_info("dma_map_single() succeeded");
        }

        // the device can be told to access the buffer at dma_addr ...

        // get hold of the buffer temporarily to do some reads/writes
        dma_sync_single_for_cpu(&dev, dma_addr, size, direction);

        // release the buffer to the device again
        dma_sync_single_for_device(&dev, dma_addr, size, direction);

        // some further device I/O...

        // done with the buffer, unmap and free
        dma_unmap_single(&dev, dma_addr, size, direction);

        // check/store buffer contents...

        // free the buffer
        kfree(kbuf);
    }

Of course, the trick with struct device might not be portable, but did work on my x86_64 and 2.6.32/35 kernels, so others might find it useful if they want to experiment with the mapping API. Transfers are impossible without the physical device, but I was able to check the bus addresses dma_map_single() generates and access the buffer after calling dma_sync_single_for_cpu(), so I think that it was worth investigating.

Thanks a lot for your answers. Any futher suggestions/improvements to the above code are welcome.

Question 2

You should use designated initialization [1] of your struct device. This will guarantee that all members not explicitly set will be cleared to zero.

struct device dev = {
    .parent = aaa,
    .bus_id = bbb,
    .bus = ccc,
    .release = ddd
};

dev_set_name(&dev, "mydev");

The Linux device drivers book, p382 indicates the following:

At a minimum,the parent, bus_id, bus,and release fields must be set before the device structure can be registered.

It's easiest to peruse code for existing drivers to determine how your new driver might fit into the existing system and device structure.

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, section 6.7.8.21

Question 3

dma_map_single returns a DMA address, which is an address on the bus that the device is connected to. In other words, a DMA address is relative to the bus and does not make any sense without a bus.

You cannot do DMA without some real device (that has been initialized by the respective bus code).