Multiline log file processing with sed and regex

https://stackoverflow.com/questions/12385931

01-07-2021
|

Question

I have a log file that looks like this:

11-Sep-2012 00:00:00 clojure.contrib.logging$fn__43$impl_write_BANG___51 invoke
INFO: creditAcc(args=[1506112834429596390 7080851004 4500])
11-Sep-2012 00:00:00 clojure.contrib.logging$fn__43$impl_write_BANG___51 invoke
INFO: callProf|tupsCredit|180|[1506112834429596390 7080851004 45]
11-Sep-2012 00:00:00 clojure.contrib.logging$fn__43$impl_write_BANG___51 invoke
INFO: creditAcc(args=[1506112834429596390 7080851004 4500]) -> done.
11-Sep-2012 00:00:00 clojure.contrib.logging$fn__43$impl_write_BANG___51 invoke
INFO: return(1506112834429596390,0)

Each entry in the log file spans two lines, so each entry begins with a timestamp. I have managed to replace the linefeed character at the end of the first line using sed, but the problem is that somewhere in the middle of the log entries are java stacktrace messages. When sed gets through the stacktraces, it reverses the order of the entries and they begin with INFO or ERROR etc and the timestamp shows as the 2nd line. I was therefore looking for a solution that would force sed to recognize the timestamp as the first line using regex [something like ^\d{2}] , then in the same line, replace the linefeed character with a space then break the values into columns for analysis. The stacktrace messages begin with blank spaces [^\s], so they are easy to identify and skip.

What is the best way to go about solving this using sed or awk?

Solution

sed '/^ /d; N; s/\n/ /' inputfile

This matches lines that begin with a space and deletes them. The d instruction skips the rest of the instructions. If a line does not begin with a space then the next line is also read in and the newline between them is changed to a space.

It only works properly if the log lines are in pairs. In other words, if a stacktrace line follows a timestamp line, with the INFO/ERROR line appearing after the stacktrace, it won't work properly.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow