I'm trying to use monitor/mwait instructions to monitor DMA writes from a device to a memory location. In a kernel module (char device) I have the following code (very similar to this piece of kernel code) that runs in a kernel thread:

static int do_monitor(void *arg)
{
  struct page *p = arg; // p is a 'struct page *'; it's also remapped to user space
  uint32_t *location_p = phys_to_virt(page_to_phys(p)); 
  uint32_t prev = 0;
  int i = 0;
  while (i++ < 20) // to avoid infinite loop
  {
    if (*location_p == prev)
    {
        __monitor(location_p, 0, 0);
        if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
          clflush(location_p);
        if (*location_p == prev)
          __mwait(0, 0);
    }
    prev = *location_p;
    printk(KERN_NOTICE "%d", prev);
  }
}

In user space I have the following test code:

int fd = open("/dev/mon_test_dev", O_RDWR);
unsigned char *mapped = (unsigned char *)mmap(0, mmap_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
for (int i = 1; i <= 5; ++i)
  *mapped = i;
munmap(mapped, mmap_size);
close(fd);

And the kernel log looks like this:

1
2
3
4
5
5
5
5
5
5
5 5 5 5 5 5 5 5 5 5

I.e. it seems that mwait doesn't wait at all. What could be the reason?

有帮助吗?

解决方案

The definition of MONITOR/MWAIT semantics does not specify explicitly whether DMA transactions may or may not trigger it. It is supposed that triggering happens for logical processor's stores.

Current descriptions of MONITOR and MWAIT in the Intel's official Software Developer Manual are quite vague to that respect. However, there are two clauses in the MONITOR section that caught my attention:

  1. The content of EAX is an effective address (in 64-bit mode, RAX is used). By default, the DS segment is used to create a linear address that is monitored.

  2. The address range must use memory of the write-back type. Only write-back memory will correctly trigger the monitoring hardware.

The first clause states that MONITOR is meant to be used with linear addresses, not physical ones. Devices and their DMA are meant to work with physical addresses only. So basically this means that all agents relying on the same MONITOR range should operate in the same domain of virtual memory space.

The second clause requires the monitored memory region to be cacheable (write-back, WB). For DMA, respective memory range is usually has to be marked as uncacheable, or write-combining at best (UC or WC). This is even a stronger indicator that your intent to use MONITOR/MWAIT to be triggered by DMA is very unlikely to work on current hardware.


Considering your high-level goal - to be able to tell when a device has written to given memory range - I cannot remember any robust method to achieve it, besides using virtualization for devices (VTd, IOMMU etc.) Basically, the classic approach for a peripheral device is to issue an interrupt when it is done with writing to memory. Until an interrupt arrives, there is no way for CPU to tell if all DMA bytes have successfully reached their destination in memory.

Device virtualization allows to abstract physical addresses from a device in a transparent manner, and have an equivalent of a page fault when it attempts to write/read from memory.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top