One to one correspending to files - in unix - log files

https://stackoverflow.com/questions/12104058

28-06-2021
|

Question

I am writing a Log Unifier program. That is, I have a system that produces logs: my.log, my.log.1, my.log.2, my.log.3...

I want on each iteration to store the number of lines I've read from a certain file, so that on the next iteration - I can continue reading on from that place.

The problem is that when the files are full, they roll: The last log is deleted ... my.log.2 becomes my.log.3 my.log.1 becomes my.log.2 my.log becomes my.log.1 and a new my.log is created

I can ofcourse keep track of them, using inodes - which are almost a one-to-one correspondence to files.

I say "almost", because I fear of the following scenario: Between two of my iterations - some files are deleted (let's say the logging is very fast), and are then new files are created and some have inodes of files just deleted. The problem is now - that I will mistake these files as old files - and start reading from line 500 (for example) instead of 0.

So I am hoping to find a way to solve this- here are a few directions I thought about - that may help you help me:

Either another 1-to-1 correspondence other than inodes.
An ability to mark a file. I thought about using chmod +x to mark the file as an existing file, and for new files that don't have these permissions - I will know they are new - but if somebody were to change the permissions manually, that would confuse my program. So if you have any other way to mark.
I thought about creating soft links to a file that are deleted when the file is deleted. That would allow me to know which files got deleted.
Any way to get the "creation date"
Any idea that comes to mind - maybe using timestamps, atime, ctime, mtime in some clever way - all will be good, as long as they will allow me to know which files are new, or any idea creating a one-to-one correspondence to files.

Thank you

Solution

I can think of a few alternatives:

Use POSIX extended attributes to store metadata about each log file that your program can use for its operation.
It should be a safe assumption that the contents of old log files are not modified after being archived, i.e. after my.log becomes my.log.1. You could generate a hash for each file (e.g. SHA-256) to uniquely identify it.
All decent log formats embed a timestamp in each entry. You could use the timestamp of the first entry - or even the whole entry itself - in the file for identification purposes. Log files are usually rolled on a periodic basis, which would ensure a different starting timestamp for each file.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow