Question

I have a large log file. I want to extract the lines containing java/javax/or/com followed by a ./:. For every line like this, I want to extract some of the corresponding lines which are stack traces and starts with at. For example:

Line1: java.line.something.somethingexception
line 2: at something something
line 3: at something something
line 4: at something something

line 5-20:Junk I don't want to extract.
line 21: javax.line.something.somethingexception
line 22: at something something
line 23: at something something
line 24: at something something

and so on...

Here I want to copy line 1-4 and then again line 21-24. So far my code collects the line which contains the keywords but I'm unable to figure out how to write a specific no of lines after that, skip a few lines and start writing again.These lines which starts with at are random, i.e they can be 100 lines or they can be 250 lines, so no pattern.

Here's my code:

import re
import sys
from itertools import islice

file = open(sys.argv[1], "r")
file1 = open(sys.argv[2],"w")
i = 0
for line in file:
    if re.search(r'[java|javax|org|com]+?[\.|:]+?', line, re.I) and not (re.search(r'at\s', line, re.I) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadPoolTaskExecutor|caused\sby', line, re.I)):
          file1.write(line)

This code only extracts the lines containing the keywords, but I'm stuck at how to do the next part,i.e copy the next lines containing at and write them to a new file, stop where 'at' ends. Search for the next line containing keywords and do the same action again.

Was it helpful?

Solution

This can be solved by flag that you set in case you match your specific conditions:

java_regex = re.compile(...)  # java 
at_regex = re.compile(...)    # at

copy = False  # flag that control to copy or to not copy to output

for line in file_in:
   if re.search(java_regex, line):
       # start copying if "java" is in the input
       copy = True
   else:
       if copy and not re.search(at_regex, line):
           # stop copying if "at" is not in the input
           copy = False

   if copy:
       file_out.write(line)

OTHER TIPS

Set a flag to indicate if the lines you are processing are in an exception block or not:

import re
import sys
from itertools import islice

file = open(sys.argv[1], "r")
file1 = open(sys.argv[2],"w")
i = 0
ex = False
for line in file:
    if re.search(r'[java|javax|org|com]+?[\.|:]+?', line, re.I) and not (re.search(r'at\s', line, re.I) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadPoolTaskExecutor|caused\sby', line, re.I)):
          file1.write(line)
          ex = True
    elif ex:
          if line.startswith('at'):
              file1.write(line)
          else:
              ex = False
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top