Question

What I'm doing

I'm implementing a python/kqueue-based (FreeBSD) solution to follow changes to a particular logfile, which when the KQ_NOTE_WRITE fflag is met, the change to the file is picked up and processed by another function within my python script.

Why I'm doing it

Ultimately, I'm taking the latest logfile entry and sending it off somewhere else as part of a quick'n'dirty accounting system.

What I think I need to know

1) As the logfile can see periods of high traffic, I wondered whether there would be any "atomicity", i.e while passing off the latest entry to the logfile, would we "miss" a new entry coming in? The fact that kqueue is a "queue", I assumed not, but history has taught me that I usually end up feeling like a plonker for such assumptions.

2) Is kqueue guaranteed to fire for each event, or could multiple events slip through? The case I'm imagining is the logfile producing 2 separate entries almost simultaneously.

Any wisdom/advice is appreciated.

Was it helpful?

Solution

Your suspicions are correct. :-)

A kqueue "event" is "expanded" if it is not in the process of being consumed when a second identical event occurs. That is, suppose the sequence of events at the low level is something like this:

1: you start monitoring the log file for writes
2: something writes to the log file (this adds a "write" notice to the kqueue)
3: your process is notified, but does not have a chance to go look yet
4: something (same something as step 2, or different, does not matter)
   writes more to the log file (this merely "expands" the existing notice,
   with no effect in this case)
5: your process finally gets a chance to read the "file was written" notice
   from the kqueue

When step 5 occurs, the "file was written" notice will just be the one notice. It is up to your code to figure out how much got written. For instance, you can use fstat() to check the length of the file at step 1, and then another fstat() after step 5. If the file is only ever appended-to, the size difference between these points is the "new data" you care about.

Note that if you see (say) 100 bytes at step 1 and 500 after step 5—say, in a step 7:

7: you fstat the file

and later get another "file was written" notice, it's possible that there was actually a "step 6" where another write happened to the file. So you should be prepared for an even later step to find that 0 bytes were added, even though you got a notice that bytes were added, because you may have already read them after the note was appended to the kqueue.

If you're watching syslog type logs, note that they get "turned over" with the file being renamed (and then sometimes compressed etc) and a new file created, e.g., "messages" becomes "messages.0.bz2" and a new "messages" is created. You can watch the directory, along with the file, and check for new file creations, to catch such cases.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top