Extract and count value from standard .gz log files on an hourly basis

https://stackoverflow.com/questions/20166034

04-08-2022
|

Вопрос

I'm trying to count the number of occurrences of a particular string from a bunch of .gz logfiles on an hourly basis. Each logfile statement starts with the following time format:

2013-11-21;09:07:23.433.

For example, to be more clear, find the count of occurrences of string "abc" between 8am to 9am, then 9am to 10am and so on. Any ideas on how to do it?

Решение

Since you just want to count occurrences, you may simply zcat the contents of the file, grep the portion that describes what you're looking for -- words/time intervals --, and finally sort/count (sort | uniq -c) the entries. The following would probably suffice:

zcat *.gz | grep <word> | grep -oP "^\d{4}-\d{2}-\d{2};\d{2}" | sort | uniq -c

The above command shall find the lines in your logfile that contains the <word> you're looking for, extract both date and hour from such entries, and later count the occurrences. In case you don't want to take into account days/months/years, you may use:

zcat *.gz | grep <word> | grep -oP "^\d{4}-\d{2}-\d{2};\K\d{2}" | sort | uniq -c

The \K added in the grep expression is a flag for look-behind in PCRE -- Perl Compatible Regular Expression.

Другие советы

Try this :

zgrep -c '2013-11-21;0[89]:.*abc' file.gz

Or awk (gawk in linux) will work:

zcat *.gz  | awk -F'[\.;:]' '{arr[$2]++} END{for(i in arr){print i, arr[i]} }' 2>/dev/null

the redirection is there because some awks, notably gawk, will complain about . not being a metacharacter

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow