I'm trying to allocate a DMA buffer for a HPC workload. It requires 64GB of buffer space. In between computation, some data is offloaded to a PCIe card. Rather than copy data into a bunch of dinky 4MB buffers given by pci_alloc_consistent, I would like to just create 64 1GB buffers, backed by 1GB HugePages.

Some background info: kernel version: CentOS 6.4 / 2.6.32-358.el6.x86_64 kernel boot options: hugepagesz=1g hugepages=64 default_hugepagesz=1g

relevant portion of /proc/meminfo: AnonHugePages: 0 kB HugePages_Total: 64 HugePages_Free: 64 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB DirectMap4k: 848 kB DirectMap2M: 2062336 kB DirectMap1G: 132120576 kB

I can mount -t hugetlbfs nodev /mnt/hugepages. CONFIG_HUGETLB_PAGE is true. MAP_HUGETLB is defined.

I have read some info on using libhugetlbfs to call get_huge_pages() in user space, but ideally this buffer would be allocated in kernel space. I tried calling do_mmap() with MAP_HUGETLB but it didn't seem to change the number of free hugepages, so I don't think it was actually backing the mmap with huge pages.

So I guess what I'm getting at, is there any way I can map a buffer to a 1GB HugePage in kernel space, or does it have to be done in user space? Or if anyone knows of any other way I can get an immense (1-64GB) amount of contiguous physical memory available as a kernel buffer?

有帮助吗?

解决方案

PROBLEM

  1. Normally if you want to allocate a DMA buffer, or get a physical address, this is done in kernel space, as user code should never have to muck around with physical addresses.
  2. Hugetlbfs only provides user-space mappings to allocate 1GB huge pages, and get user-space virtual addresses
  3. No function exists to map a user hugepage virtual address to a physical address

EUREKA

But the function does exist! Buried deep in the 2.6 kernel source code lies this function to get a struct page from a virtual address, marked as "just for testing" and blocked with #if 0:

#if 0   /* This is just for testing */
struct page *
follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
{
    unsigned long start = address;
    int length = 1;
    int nr;
    struct page *page;
    struct vm_area_struct *vma;

    vma = find_vma(mm, addr);
    if (!vma || !is_vm_hugetlb_page(vma))
        return ERR_PTR(-EINVAL);

    pte = huge_pte_offset(mm, address);

    /* hugetlb should be locked, and hence, prefaulted */
    WARN_ON(!pte || pte_none(*pte));

    page = &pte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)];

    WARN_ON(!PageHead(page));

    return page;
}

SOLUTION: Since the function above isn't actually compiled into the kernel, you will need to add it to your driver source.

USER SIDE WORKFLOW

  1. Allocate 1gb hugepages at boot with kernel boot options
  2. Call get_huge_pages() with hugetlbfs to get user space pointer (virtual address)
  3. Pass user virtual address (normal pointer cast to unsigned long) to driver ioctl

KERNEL DRIVER WORKFLOW

  1. Accept user virtual address via ioctl
  2. Call follow_huge_addr to get the struct page*
  3. Call page_to_phys on the struct page* to get the physical address
  4. Provide physical address to device for DMA
  5. Call kmap on the struct page* if you also want a kernel virtual pointer

DISCLAIMER

  • The above steps are being recollected several years later. I have lost access to the original source code. Do your due diligence and make sure I'm not forgetting a step.
  • The only reason this works is because 1GB huge pages are allocated at boot time and their physical addresses are permanently locked. Don't try to map a non-1GBhugepage-backed user virtual address into a DMA physical address! You're going to have a bad time!
  • Test carefully on your system to confirm that your 1GB huge pages are in fact locked in physical memory and that everything is working exactly. This code worked flawlessly on my setup, but there is great danger here if something goes wrong.
  • This code is only guaranteed to work on x86/x64 architecture (where physical address == bus address), and on kernel version 2.6.XX. There may be an easier way to do this on later kernel versions, or it may be completely impossible now.

其他提示

This is not commonly done in the kernel space, so not too many examples.

Just like any other page, huge pages are allocated with alloc_pages, to the tune:

struct page *p = alloc_pages(GFP_TRANSHUGE, HPAGE_PMD_ORDER);

HPAGE_PMD_ORDER is a macro, defining an order of a single huge page in terms of normal pages. The above implies that transparent huge pages are enabled in kernel.

Then you can proceed mapping the obtained page pointer in the usual fashion with kmap().

Disclaimer: I never tried it myself, so you may have to do some experimenting around. One thing to check for is this: HPAGE_PMD_SHIFT represents an order of a smaller "huge" page. If you want to use those giant 1GB pages, you will probably need to try a different order, probably PUD_SHIFT - PAGE_SHIFT.

This function returns correct virtual addr in kernel space if given physical address from user space allocated in hugespace.

static inline void * phys_to_virt(unsigned long address)

Look for function on kernel code, it is tested with dpdk and kernel module.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top