Bash/python/perl magic to get aggregate datetimes across multiple log files

https://stackoverflow.com/questions/15795337

01-04-2022
|

Question

I have a directory (/home/myuser/logs) that contains the following log files for the last 5 days:

applogs_20130402.txt
applogs_20130401.txt
applogs_20130331.txt
applogs_20130330.txt

Each line of every "applog" has the same structure, just different data:

<timestamp> | <fruit> | <color> | <cost>

So for example, applogs_20130402.txt might look like:

23:41:25 | apple | red | 53
23:41:26 | kiwi | brown | 12
23:41:29 | banana | yellow | 1023
... (etc., every line is pipe delimited like this)

I want to create one "master log" that combines all the log entries (structured, pipe-delimited lines) from all 5 log files into a single file where all timestamps are chronologically ordered. Further, I need the date reflected in the timestamps as well.

So, for instance, if applogs_20130402.txt and applogs_20130401.txt were the only 2 applogs in the directory, and they both looked like this respectively:

applogs_20130402.txt:
=====================
23:41:25 | apple | red | 53
23:41:26 | kiwi | brown | 12
23:41:29 | banana | yellow | 1023

applogs_20130401.txt:
=====================
23:40:33 | blueberry | blue | 4
23:41:28 | apple | green | 81
23:45:49 | plumb | purple | 284

Then, I would want a masterlog.txt file that looks like:

2013-04-01 23:40:33 | blueberry | blue | 4
2013-04-01 23:41:28 | apple | green | 81
2013-04-01 23:45:49 | plumb | purple | 284
2013-04-02 23:41:25 | apple | red | 53
2013-04-02 23:41:26 | kiwi | brown | 12
2013-04-02 23:41:29 | banana | yellow | 1023

I'm on Ubuntu and have access to Bash, python and perl and have no preference which solution is used. Ordinarily I would try a "best attempt" and post it, but I've never dealt with aggregating data like this on Linux. Obviously, the logs are thousands of lines in size, unlike my example above. So doing everything manually isn't an option ;-) Thanks in advance!

Solution

You can use Perl from the command line together with sort like this:

perl -n -e 'printf "%d-%02d-%02d %s", $ARGV =~ m/_(\d{4})(\d\d)(\d\d)/, $_;' *.txt | sort -n

Calling perl with -n wraps a while (<>) { } around your program, which in this case is the stuff in -e ''. In it, we printf the current line ($_), and in front of that we put the date from the file name, which is stored in $ARGV. We use a regex to grab the year, month and day, which are conveniently returned by m// because of the list context from printf.

To this program, we pass all txt files in the folder. The result is piped to the command line tool sort, which sorts the lines numerically using the -n flag.

OTHER TIPS

Just for the sake of completeness, here is a (g)awk one-liner to accomplish the same:

gawk '{ printf "%s %s\n", gensub(/.+_([0-9]{4})([0-9]{2})([0-9]{2}).+/, "\\1-\\2-\\3", "", FILENAME), $0 }' applogs_* | sort

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow