Generate Summary report from logs : Peform additions on output of a command ( using AWK / SED or any other way) and formatting output

StackOverflow https://stackoverflow.com/questions/21593299

Question

I am processing several files at a time.Each of which has summary stats . At the end of the process I want to create a summary file that will add up all the stats . I already know how to dig out the stats from the log files. But I want to be able to add the numbers and echo to another file Here is what I use to dig out the times .

find . -iname "$srch1*" -exec grep "It took" {} \; -print

output would be like this

    It took 0 hours, 11 minutes and 4 seconds to process that file.
./filepart000010-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 56 seconds to process that file.
./filepart000007-20140204-154923.dat.gz.log
It took 0 hours, 29 minutes and 54 seconds to process that file.
./filepart000001-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 33 seconds to process that file.
./filepart000004-20140204-154923.dat.gz.log
It took 0 hours, 59 minutes and 38 seconds to process that file.
./filepart000000-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 50 seconds to process that file.
./filepart000005-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 10 seconds to process that file.
./filepart000002-20140204-154923.dat.gz.log
It took 0 hours, 10 minutes and 39 seconds to process that file.
./filepart000008-20140204-154923.dat.gz.log
It took 0 hours, 12 minutes and 27 seconds to process that file.
./filepart000009-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 36 seconds to process that file.
./filepart000003-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 40 seconds to process that file.
./filepart000006-20140204-154923.dat.gz.log

what I want is something like this

Summary 
filepart000006-20140204-154923.dat.gz.log  0 hours, 11 minutes and 40 seconds

then find out the LONGEST times among them and output some message like .

 Total time taken =____________

I am running in parallel so the time taken is the longest one.

then do some calculations like this.

find . -iname "$srch*" -exec grep "Processed Files" {} \; -print

        Processed Files:   7936635
./filename-20131102-part000000-20140204-153310.dat.gz.log
        Processed Files:   3264805
./filename-20131102-part000001-20140204-153310.dat.gz.log
        Processed Files:   1607547
./filename-20131102-part000008-20140204-153310.dat.gz.log
        Processed Files:   3180478
./filename-20131102-part000003-20140204-153310.dat.gz.log
        Processed Files:   1595497
./filename-20131102-part000007-20140204-153310.dat.gz.log
        Processed Files:   1568532
./filename-20131102-part000009-20140204-153310.dat.gz.log
        Processed Files:   3259884
./filename-20131102-part000002-20140204-153310.dat.gz.log
        Processed Files:   3141542
./filename-20131102-part000004-20140204-153310.dat.gz.log
        Processed Files:   3124221
./filename-20131102-part000005-20140204-153310.dat.gz.log
        Processed Files:   3136845
./filename-20131102-part000006-20140204-153310.dat.gz.log

and if I want just the metrics

( find . -iname "dl-aster-full-20131102*" -exec grep "Processed Files" {} \;) | cut -d":" -f2
   7936635
   3264805
   1607547
   3180478
   1595497
   1568532
   3259884
   3141542
   3124221
   3136845

Based on the above 2 just create a summary file .

Filename                                                  Processed files 
filename-20131102-part000000-20140204-153310.dat.gz.log   7936635

.... then a summary which is all the above added.

   ( 7936635 +
   3264805 +
   1607547 +
   3180478.....etc
   1595497
   1568532
   3259884
   3141542
   3124221
   3136845 ) as 


 Total Files = ____________

so overall like this one .

Filename                                                  Processed files 
    filename-20131102-part000000-20140204-153310.dat.gz.log   7936635
     Total Files = ____________ ( sum of all above ) 

All that that needs to be done is -- Get the output in format

 Filename                                                  Processed files 
    filename-20131102-part000000-20140204-153310.dat.gz.log   7936635

in my above command they are on different line and then perform summation for the numbers already outputted.

My Question is . -- How can I perform addition like above - using anything. I'd avoid PERL , since I am not sure , it'd be installed everywhere where the shell is run -- How can I format the output like above . I already know how to extract the output

Was it helpful?

Solution

with below sed command, you can get the output (filename and grep result into one line), then the next will be easy for you. (the grep result should be only one line for each file)

find . -iname "$srch1*" -exec grep "It took" {} \; -print |sed -r 'N;s/(.*)\n(.*)/\2 \1/'

./filepart000010-20140204-154923.dat.gz.log    It took 0 hours, 11 minutes and 4 seconds to process that file.
./filepart000007-20140204-154923.dat.gz.log It took 0 hours, 11 minutes and 56 seconds to process that file.
./filepart000001-20140204-154923.dat.gz.log It took 0 hours, 29 minutes and 54 seconds to process that file.
./filepart000004-20140204-154923.dat.gz.log It took 0 hours, 22 minutes and 33 seconds to process that file.
./filepart000000-20140204-154923.dat.gz.log It took 0 hours, 59 minutes and 38 seconds to process that file.
./filepart000005-20140204-154923.dat.gz.log It took 0 hours, 11 minutes and 50 seconds to process that file.
./filepart000002-20140204-154923.dat.gz.log It took 0 hours, 22 minutes and 10 seconds to process that file.
./filepart000008-20140204-154923.dat.gz.log It took 0 hours, 10 minutes and 39 seconds to process that file.
./filepart000009-20140204-154923.dat.gz.log It took 0 hours, 12 minutes and 27 seconds to process that file.
./filepart000003-20140204-154923.dat.gz.log It took 0 hours, 22 minutes and 36 seconds to process that file.
./filepart000006-20140204-154923.dat.gz.log It took 0 hours, 11 minutes and 40 seconds to process that file.


find . -iname "$srch*" -exec grep "Processed Files" {} \; -print| sed -r 'N;s/(.*)\n(.*)/\2 \1/' 
./filename-20131102-part000000-20140204-153310.dat.gz.log         Processed Files:   7936635
./filename-20131102-part000001-20140204-153310.dat.gz.log         Processed Files:   3264805
./filename-20131102-part000008-20140204-153310.dat.gz.log         Processed Files:   1607547
./filename-20131102-part000003-20140204-153310.dat.gz.log         Processed Files:   3180478
./filename-20131102-part000007-20140204-153310.dat.gz.log         Processed Files:   1595497
./filename-20131102-part000009-20140204-153310.dat.gz.log         Processed Files:   1568532
./filename-20131102-part000002-20140204-153310.dat.gz.log         Processed Files:   3259884
./filename-20131102-part000004-20140204-153310.dat.gz.log         Processed Files:   3141542
./filename-20131102-part000005-20140204-153310.dat.gz.log         Processed Files:   3124221
./filename-20131102-part000006-20140204-153310.dat.gz.log         Processed Files:   3136845

If you need calculate the longest time and total time, use below script (you should be fine to format the output.)

find . -iname "$srch1*" -exec grep "It took" {} \; -print |sed -r 'N;s/(.*)\n(.*)/\2 \1/' > temp1
awk 'function s2t(x) { h=int(x/3600);m=int((x-h*3600)/60);s=x-h*3600-m*60}
{a=$4*3600+$6*60+$9;max=a>max?a:max;t+=a}
END{ s2t(max);print "max is",h,m,s;
s2t(t);print "sum is " ,h,m,s}' temp1

max is 0 59 38
sum is  3 46 27

for second one:

find . -iname "$srch*" -exec grep "Processed Files" {} \; -print| sed -r 'N;s/(.*)\n(.*)/\2 \1/'  > temp2
awk '{sum+=$NF}END{print "Total Files = ", sum}' temp2

Total Files =  31815986
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top