correct use of linux inotify - reopen every time?

Question 1

Usually the pseudo code inotify loop would look like this:

initialize inotify
watch a directory | file for events

while(receive event) {
  process event
}

[ remove watch ]
close inotify fd

There is no need to remove the watch and reinitialize inotify on every loop.

Question 2

I've tried to duplicate your problem. I dont get the same results you see. But yes, its wrong to use inotify like that. Normally you initialize inotify then read / poll from its watch descriptor.

I ran this with strace -T and get nowhere near that level of performance on close().

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <err.h>
#include <sysexits.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/inotify.h>
#include <errno.h>

#define WATCHDIR "./watched"

void child_run(void)
{
    printf("Child spawned..\n");
    int fd;
    if (chdir(WATCHDIR))
        err(EX_OSERR, "Cannot chdir in child");

    /* Care not if this fails.. */
    unlink("myfile.dat");

    while (1) {
        fd = open("myfile.dat", O_CREAT|O_EXCL, S_IRUSR|S_IWUSR);
        if (fd < 0) {
            warn("Cannot create necessary file.. sleeping");
            sleep(1);
        }
        close(fd);
        fd = -1;
        if (unlink("myfile.dat") < 0)
            err(EX_OSERR, "Cannot unlink file in watched directory");
    }

}

int main() 
{
    int watch_fd = -1;
    int watched = -1;
    struct inotify_event ev[128];
    memset(ev, 0, sizeof(&ev)*128);

    if (mkdir(WATCHDIR, S_IRWXU) < 0) {
        if (errno != EEXIST) {
            err(EX_OSERR, "Cannot create directory");
        }
    }

    if (fork() == 0) {
        child_run();
        exit(0);
    }

    while (1) {
        if ((watch_fd = inotify_init1(IN_CLOEXEC)) < 0)
            err(EX_OSERR, "Cannot init inotify");

        if (watch_fd < 0)
            err(EX_OSERR, "Cannot init watch");

        if ((watched = inotify_add_watch(watch_fd, WATCHDIR, IN_CREATE)) < 0)
            err(EX_OSERR, "Cannot add watched directory");

        if (read(watch_fd, ev, sizeof(ev)*128) < 0)
            err(EX_OSERR, "Cannot read from watcher");

        if (inotify_rm_watch(watch_fd, watched) < 0)
            err(EX_OSERR, "Cannot remove watch");

        close(watch_fd);
    }
    return 0;
}

If you run this do you get the same performance on that host?

Question 3

I've found the smoking gun. From profiling the kernel (perf top is what I was looking for):

Events: 109K cycles
 70.01%  [kernel]      [k] _spin_lock
 24.30%  [kernel]      [k] __fsnotify_update_child_dentry_flags
  2.24%  [kernel]      [k] _spin_unlock_irqrestore
  0.64%  [kernel]      [k] __do_softirq
  0.60%  [kernel]      [k] __rcu_process_callbacks
  0.46%  [kernel]      [k] run_timer_softirq
  0.40%  [kernel]      [k] rcu_process_gp_end

Spending 70% of our time in _spin_lock (remember, we theorized this may be the cause) explains all the symptoms. The second entry on the list is likely the culprit:

http://lxr.free-electrons.com/source/fs/notify/fsnotify.c?a=sh#L52

Without thoroughly analyzing the code, it appears that with the test case provided, that code is going to loop over all 262K directory entries in SOURCES inside a kernel lock. That behaviour is probably incorrect and comes from using the inotify API incorrectly.

Calling a fs remount (with the test still running) makes it behave better:

Events: 38K cycles                                                                                                          
 20.41%  [kernel]      [k] _spin_lock
 17.43%  [kernel]      [k] _spin_unlock_irqrestore
 12.40%  [kernel]      [k] __fsnotify_update_child_dentry_flags
  6.44%  [kernel]      [k] run_timer_softirq
  5.65%  [kernel]      [k] __do_softirq          
  5.18%  [kernel]      [k] update_shares
  5.02%  [kernel]      [k] __rcu_process_callbacks

But still not ideal.