Question

I have a log file which needs to be properly formatted into a readable format. However the text file has no static number of lines or fixed primary values and has random number of spaces but has only a log file header which can be used to pin point the start and end of each time the application logs.

An Example of the log file:

Log File header
<text>
<text>
Log File header
<text>

After the script has been formatted it should look something like this:

Log File header
<text>
<text>

<space>

Log File header
<text>
<text>

Therefore I need some advice on greping out an entire paragraph everytime the Perl Script detects a "Log File header".

Here is the grep perl script:

#!/usr/bin/perl

#use 5.010; # must be present to import the new 5.10 functions, notice 
#that it is 5.010 not 5.10

my $file = "/root/Desktop/Logfiles.log";
open LOG, $file or die "The file $file has the error of:\n =>  $!";

@lines = <LOG>;
close (LOG);

@array = grep(/Log File header/, @lines);

print @array;

Can someone please give some advice on the codes? Thanks.

Was it helpful?

Solution

So you just want vertical space in between your log file sections?

There are a few approaches, particularly because you know the header will be on a completely separate line. In all the following examples assume that @lines has already been populated from your input file.

So first technique: insert spaces before header:

foreach my $line ( @lines ) {
    if ( $line =~ m/Log File header/ ) {
        print( "\n\n\n" ); # or whatever you want <space> to be
    }

    print( $line );
}

The next technique is to use a regular expression to search/replace blocks of text:

my $space = "\n\n\n"; # or whatever you want <space> to be
my $everything = join( "", @lines );
$everything =~ s/(Log File header.*?)(?=Log File header)/$1$space/sg;
print( $everything );

Some explanation about the regexp. The (?= means "look-ahead" which will match but not form part of the expression to be replaced. The /sg modifiers mean s-treat newlines as ordinary whitespace and g-do a global search-and-replace. The .*? means select anything, but as little as possible to satisfy the expression (non-greedy), which is extremely important in this application.

update: edited first technique in which I'd failed to explicitly specify which variable to do the match upon.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top