Question

I have a log file with several hundred thousand lines.

I am looping through these lines to find any line with some specific text, for example: !!event!!.
Then, once an !!event!! line is found, I need to continue looping after this !!event!! line until I find the next 3 lines which contain their own specific text ('flag1', 'flag2', and 'flag3').
Once I find the third line ('flag3'), I then want to continue looping for the next !!event!! line and repeat the former process until there are no more events.

Does anyone have suggestions on ways I structure my code to accomplish this?

For example:

f = open('samplefile.log','r')
for line in f:
    if '!!event!!' in line:
            L0 = line
        #then get the lines after L0 containing: 'flag1', 'flag2', and 'flag3'
        # below is a sample log file  
        #I am not sure how to accomplish this 
        #(I am thinking a loop within the current loop) 
        #I know the following is incorrect, but the 
         intended result would be able to yield something like this:
            if "flag1" in line:
                L1 = line.split()
            if "flag2" in line:
                L2 = line.split()
            if "flag3" in line:
                L3 = line.split()
print 'Event and flag times: ', L0[0], L1[0], L2[0], L3[0]

samplefile.log

8:41:05 asdfa   32423
8:41:06 dasd    23423
8:41:07 dfsd    342342
8:41:08 !!event!!   23423
8:41:09 asdfs   2342
8:41:10 asdfas  flag1
8:41:11 asda    42342
8:41:12 sdfs    flag2
8:41:13 sdafsd  2342
8:41:14 asda    3443
8:41:15 sdfs    2323
8:41:16 sdafsd  flag3
8:41:17 asda    2342
8:41:18 sdfs    3443
8:41:19 sdafsd  2342
8:41:20 asda    3443
8:41:21 sdfs    4544
8:41:22 !!event!!   5645
8:41:23 sdfs    flag1
8:41:24 sadfs   flag2
8:41:25 dsadf   32423
8:41:26 sdfa    23423
8:41:27 sdfsa   flag3
8:41:28 sadfa   23423
8:41:29 sdfas   2342
8:41:30 dfsdf   2342

the code from this sample should print:

Event and flag times: 8:41:08 8:41:10 8:41:12 8:41:16
Event and flag times: 8:41:22 8:41:23 8:41:24 8:41:27
Was it helpful?

Solution

Sure, you can continue to consume the file in an inner loop, then break out of it when you encounter flag3, and the outer loop will resume:

for line in f:
    if '!!event!!' in line:
        L0 = line.split()
        for line in f:
            if "flag1" in line:
                L1 = line.split()
            elif "flag2" in line:
                L2 = line.split()
            elif "flag3" in line:
                L3 = line.split()
                break             # continue outer loop
        print 'Event and flag times: ', L0[0], L1[0], L2[0], L3[0]

# Event and flag times:  8:41:08 8:41:10 8:41:12 8:41:16
# Event and flag times:  8:41:22 8:41:23 8:41:24 8:41:27

OTHER TIPS

Here you go:

with open("in6.txt") as f:
    flag = False
    c = 0
    d = []
    data = []
    for line in f:
        if flag:
            if "flag1" in line or "flag2" in line:
                data.append(line.split()[0])
            elif "flag3" in line:
                data.append(line.split()[0])
                flag = False
                d.append(data)

            continue
        if "!!event!!" in line:
            flag = True
            data = []
            c = 0
            data.append(line.split()[0])

for l in d:
    print "Event and flag times: ", l[0], l[1], l[2], l[3]

Output

>>> 
Event and flag times:  8:41:08 8:41:10 8:41:12 8:41:16
Event and flag times:  8:41:22 8:41:23 8:41:24 8:41:27

Keep a flag to track what you are looking for:

with open('samplefile.log') as f:
    events = []
    current_event = []
    for line in f:
        if not current_event and '!!event!!' in line:
            current_event.append(line.split()[0])
        else:
            if 'flag1' in line or 'flag2' in line or 'flag3' in line:
                current_event.append(line.split()[0])
                if 'flag3' in line:  # could also be `if len(current_event) == 4:`
                    events.append(current_event)
                    current_event = []

for event in events:
    print 'Event and flag times:', ' '.join(event)

Here I used current_event as the flag; by adding the !!event!! line time to it it becomes non-empty and we start looking for the flags.

I collected the individual event times into a events list, but you could also just print the event data whenever you have found the flag3 line.

Output:

Event and flag times: 8:41:08 8:41:10 8:41:12 8:41:16
Event and flag times: 8:41:22 8:41:23 8:41:24 8:41:27

Just loop over each line, then when you find !!event!!, start looking for the flags, and once all flags are found, continue...

Something like:

def get_time(line):
    return [ i for i in line.split() if i != ''][0]

data = []
index = -1
look_for_flags = False
for line in lines:
    if '!!event!!' in line:
        look_for_flags = True
        data.append([get_time(line)])
        index += 1
    elif look_for_flags:
        if 'flag1' in line or 'flag2' in line or 'flag3' in line:
            data[index].append(get_time(line))
print data

The clearest way to do this is using a generator function, which obviates the need to keep any state. Whenever you need to build a state machine (like you're doing here), think generator.

import sys

def find_target_lines(file_handle):
    target = yield
    for line in file_handle:
        if target in line:
            target = yield line

f = open('samplefile.log','r')
targets = ['!!event!!', 'flag1', 'flag2', 'flag3']

while True:
    found = list()
    finder = find_target_lines(f)
    next(finder)
    try:
        for target in targets:
            line = finder.send(target)
            if line:
                found.append(line)
        print(found)
    except StopIteration:
        break
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top