Question

I'm facing an issue by putting inotify to a huge amount of folder (~2 000 000). I changed the max_user_watches to 8388608 :

echo 8388608 > /proc/sys/fs/inotify/max_user_watches

To support this amount of watches, the server has 3Gb free memory. But when I launch the inotify script, after several hours (~15h), it's been killed. Here is the /var/log/messages trace :

inotify.sh invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0
inotify.sh cpuset=/ mems_allowed=0
Pid: 12488, comm: inotify.sh Not tainted 2.6.32-5-686 #1
Call Trace:
 [<c1089534>] ? oom_kill_process+0x60/0x201
 [<c1089ab1>] ? __out_of_memory+0xf4/0x107
 [<c1089b1e>] ? out_of_memory+0x5a/0x7c
 [<c108c3c9>] ? __alloc_pages_nodemask+0x3ef/0x4d9
 [<c108c4bf>] ? __get_free_pages+0xc/0x17
 [<c102f06c>] ? copy_process+0xb7/0xf28
 [<c11028ad>] ? security_file_alloc+0xc/0xd
 [<c1030017>] ? do_fork+0x13a/0x2bc
 [<c10b182d>] ? fd_install+0x1e/0x3c
 [<c103b996>] ? recalc_sigpending+0xf/0x2e
 [<c103bcae>] ? sigprocmask+0x9d/0xbc
 [<c1001dae>] ? sys_clone+0x21/0x27
 [<c10030fb>] ? sysenter_do_call+0x12/0x28
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:  41
HighMem per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:  34
active_anon:53061 inactive_anon:13287 isolated_anon:0
 active_file:2006 inactive_file:4143 isolated_file:0
 unevictable:0 dirty:272 writeback:0 unstable:0
 free:499244 slab_reclaimable:174921 slab_unreclaimable:25041
 mapped:1892 shmem:42 pagetables:240 bounce:0
DMA free:3492kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12080kB slab_unreclaimable:332kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 861 3032 3032
Normal free:59488kB min:3720kB low:4648kB high:5580kB active_anon:0kB inactive_anon:0kB active_file:1092kB inactive_file:3380kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:881880kB mlocked:0kB dirty:1088kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:687604kB slab_unreclaimable:99832kB kernel_stack:952kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1211 all_unreclaimable? yes
lowmem_reserve[]: 0 0 17366 17366
HighMem free:1933996kB min:512kB low:2856kB high:5200kB active_anon:212244kB inactive_anon:53148kB active_file:6932kB inactive_file:13192kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2222948kB mlocked:0kB dirty:0kB writeback:0kB mapped:7564kB shmem:168kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:960kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 19*4kB 43*8kB 16*16kB 6*32kB 17*64kB 12*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3492kB
Normal: 14406*4kB 99*8kB 1*16kB 3*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 59488kB
HighMem: 31*4kB 14*8kB 8*16kB 2*32kB 2*64kB 1*128kB 2*256kB 1*512kB 1*1024kB 1*2048kB 471*4096kB = 1933996kB
6194 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 1951736kB
Total swap = 1951736kB
786416 pages RAM
560130 pages HighMem
7712 pages reserved
7694 pages shared
275891 pages non-shared

I've almost 2Gb of free memory when the script being killed. So I don't really understand this OOM. Can someone help me ? The script is as simple as :

#!/bin/bash

INOTIFY_DIR="/data"
inotifywait -rme modify,attrib,move,close_write,create,delete,delete_self $INOTIFY_DIR 2>> /datatest/inotify.log
Was it helpful?

Solution

The OOM killer is associated with the page_fault_handler(). If the page_fault_handler could not allocate the page because of non availability it will call the OOM killer through the kernel call page_fault_out_of_memory(). Then the OOM killer logic will kick in and select the best candidate process to kill and clean up memory. the logic follows a heuristic approach to find the most killable process so that it can free up the memory and keep the system alive.

Why the OOM killer is kicking in is purely based on the availability of free pages. This could be a page alloc failure from any memory zones. You have large enough high memory available doesn't mean that you have memory available in all other zones. etc ZONE_DMA, low mem etc.

take a look in page_fault_out_of_memory() in oom_kill.c for more details about the oom killer algorithm

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top