Frage

I'm debugging a system load problem that a customer encounters on their production system and they've made a test application that simulates the load to reproduce the problem:

sexy graph

In this particular workload, one of the things the coder did was to:

while(1)
  initialize inotify
  watch a directory for events
  receive event
  process event
  remove watch
  close inotify fd

Strangely enough, the high system load comes from the close() of the inotify fd:

inotify_init()                          = 4 <0.000020>
inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.059537>
write(1, "Child [032] sleeping\n", 21)  = 21 <0.000012>
read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.b8tlfT\0\0", 512) = 32 <0.231012>
inotify_rm_watch(4, 1)                  = 0 <0.000044>
close(4)                                = 0 <0.702530>
open("/mnt/tmp/msys_sim/QUEUES/Child_032", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 <0.000031>
lseek(4, 0, SEEK_SET)                   = 0 <0.000010>
getdents(4, /* 3 entries */, 32768)     = 88 <0.000048>
getdents(4, /* 0 entries */, 32768)     = 0 <0.000009>
write(1, "Child [032] dequeue [SrcFile.b8t"..., 37) = 37 <0.000011>
unlink("/mnt/tmp/msys_sim/QUEUES/Child_032/SrcFile.b8tlfT") = 0 <0.059298>
lseek(4, 0, SEEK_SET)                   = 0 <0.000011>
getdents(4, /* 2 entries */, 32768)     = 48 <0.000038>
getdents(4, /* 0 entries */, 32768)     = 0 <0.000009>
close(4)                                = 0 <0.000012>
inotify_init()                          = 4 <0.000020>
inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.040385>
write(1, "Child [032] sleeping\n", 21)  = 21 <0.000903>
read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.mQgUSh\0\0", 512) = 32 <0.023423>
inotify_rm_watch(4, 1)                  = 0 <0.000012>
close(4)                                = 0 <0.528736>

What could possibly be causing the close() call to take such an enormous amount of time? I can identify two possible things:

  • closing and reinitializing inotify every time
  • There are 256K files (flat) in /mnt/tmp/msys_sim/SOURCES and a particular file in /mnt/tmp/msys_sim/QUEUES/Child_032 is hardlinked to one in that directory. But SOURCES is never opened by the above process

Is it an artifact of using inotify wrong? What can I point at to say "What you're doing is WRONG!"?


Output of perf top (I had been looking for this!)

Events: 109K cycles
 70.01%  [kernel]      [k] _spin_lock
 24.30%  [kernel]      [k] __fsnotify_update_child_dentry_flags
  2.24%  [kernel]      [k] _spin_unlock_irqrestore
  0.64%  [kernel]      [k] __do_softirq
  0.60%  [kernel]      [k] __rcu_process_callbacks
  0.46%  [kernel]      [k] run_timer_softirq
  0.40%  [kernel]      [k] rcu_process_gp_end

Sweet! I suspect a spinlock somewhere and the entire system goes highly latent when this happens.

War es hilfreich?

Lösung

Usually the pseudo code inotify loop would look like this:

initialize inotify
watch a directory | file for events

while(receive event) {
  process event
}

[ remove watch ]
close inotify fd

There is no need to remove the watch and reinitialize inotify on every loop.

Andere Tipps

I've tried to duplicate your problem. I dont get the same results you see. But yes, its wrong to use inotify like that. Normally you initialize inotify then read / poll from its watch descriptor.

I ran this with strace -T and get nowhere near that level of performance on close().

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <err.h>
#include <sysexits.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/inotify.h>
#include <errno.h>

#define WATCHDIR "./watched"

void child_run(void)
{
    printf("Child spawned..\n");
    int fd;
    if (chdir(WATCHDIR))
        err(EX_OSERR, "Cannot chdir in child");

    /* Care not if this fails.. */
    unlink("myfile.dat");

    while (1) {
        fd = open("myfile.dat", O_CREAT|O_EXCL, S_IRUSR|S_IWUSR);
        if (fd < 0) {
            warn("Cannot create necessary file.. sleeping");
            sleep(1);
        }
        close(fd);
        fd = -1;
        if (unlink("myfile.dat") < 0)
            err(EX_OSERR, "Cannot unlink file in watched directory");
    }

}

int main() 
{
    int watch_fd = -1;
    int watched = -1;
    struct inotify_event ev[128];
    memset(ev, 0, sizeof(&ev)*128);

    if (mkdir(WATCHDIR, S_IRWXU) < 0) {
        if (errno != EEXIST) {
            err(EX_OSERR, "Cannot create directory");
        }
    }

    if (fork() == 0) {
        child_run();
        exit(0);
    }

    while (1) {
        if ((watch_fd = inotify_init1(IN_CLOEXEC)) < 0)
            err(EX_OSERR, "Cannot init inotify");

        if (watch_fd < 0)
            err(EX_OSERR, "Cannot init watch");

        if ((watched = inotify_add_watch(watch_fd, WATCHDIR, IN_CREATE)) < 0)
            err(EX_OSERR, "Cannot add watched directory");

        if (read(watch_fd, ev, sizeof(ev)*128) < 0)
            err(EX_OSERR, "Cannot read from watcher");

        if (inotify_rm_watch(watch_fd, watched) < 0)
            err(EX_OSERR, "Cannot remove watch");

        close(watch_fd);
    }
    return 0;
}

If you run this do you get the same performance on that host?

I've found the smoking gun. From profiling the kernel (perf top is what I was looking for):

Events: 109K cycles
 70.01%  [kernel]      [k] _spin_lock
 24.30%  [kernel]      [k] __fsnotify_update_child_dentry_flags
  2.24%  [kernel]      [k] _spin_unlock_irqrestore
  0.64%  [kernel]      [k] __do_softirq
  0.60%  [kernel]      [k] __rcu_process_callbacks
  0.46%  [kernel]      [k] run_timer_softirq
  0.40%  [kernel]      [k] rcu_process_gp_end

Spending 70% of our time in _spin_lock (remember, we theorized this may be the cause) explains all the symptoms. The second entry on the list is likely the culprit:

http://lxr.free-electrons.com/source/fs/notify/fsnotify.c?a=sh#L52

Without thoroughly analyzing the code, it appears that with the test case provided, that code is going to loop over all 262K directory entries in SOURCES inside a kernel lock. That behaviour is probably incorrect and comes from using the inotify API incorrectly.

Calling a fs remount (with the test still running) makes it behave better:

Events: 38K cycles                                                                                                          
 20.41%  [kernel]      [k] _spin_lock
 17.43%  [kernel]      [k] _spin_unlock_irqrestore
 12.40%  [kernel]      [k] __fsnotify_update_child_dentry_flags
  6.44%  [kernel]      [k] run_timer_softirq
  5.65%  [kernel]      [k] __do_softirq          
  5.18%  [kernel]      [k] update_shares
  5.02%  [kernel]      [k] __rcu_process_callbacks

But still not ideal.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top