Parsing a specific file using python [closed]

https://stackoverflow.com/questions/13846937

07-12-2021
|

Question

I have a 300 meg file that contains lines like these

[0] ppt: (non-cluster) 4294967295 1 1.9.1.25
timestamp: 1355333137
states: 680 [138(average 2752 0)][139(average 2802 0)][2253(average 2008 0)][2484(average 2321 0)][2578(average 2792 0)][2615(average 3518 0)]
[1] ppt: (non-cluster) 4294967295 1 1.9.1.26
timestamp: 1355333137
states: 676 [138(average 2761 0)][139(average 2777 0)][2253(average 2075 0)][2484(average 2318 0)][2578(average 2792 0)][2615(average 3522 0)]

I would appreciate suggestions on how to use Python to parse the file, produce list of dictionaries like

1.9.1.25 ( 138: 2752, 139: 2802, 2253: 2008, 2484: 2321, 2578: 2792, 2615: 3518)
1.9.1.26 ( 138: 2761, 139: 2777, 2253: 2075, 2482: 2318, 2578: 2793, 2615: 3522)

and store the list in a file.

Thanx

Solution

This is not very elegant, but here you go:

import re

start_ln = re.compile(r'\[\d+\] ppt: \(.*?\) \d+ \d+ (?P<ivar>\d+\.\d+\.\d+\.\d+)')
tstamp_ln = re.compile(r'timestamp: \d+')
state_ln = re.compile(r'states: (?P<pcount>\d+) (?P<ggroup>(\[\d+\(average \d+ \d+\)\])+)')
group_p = re.compile(r'\[(?P<st>\d+)\(average (?P<avg>\d+) \d+\)\]')

f = open('pfile', 'r')

state = 'WAIT'
llist = []
ldict = {}
cvar = None

for ln in f:
    if state == 'WAIT':
        mtch = start_ln.match(ln)
        if mtch is not None:
            cvar = mtch.groupdict()['ivar']
            ldict = {}
            state = 'LINE#1'
            continue
    elif state == 'LINE#1':
        mtch = tstamp_ln.match(ln)
        if mtch is not None:
            ldict.update(mtch.groupdict())
            state = 'LINE#2'
            continue
    elif state == 'LINE#2':
        mtch = state_ln.match(ln)
        if mtch is not None:
            groupline = mtch.groupdict()['ggroup']
            mtch2 = group_p.findall(groupline)
            ldict[cvar] = dict(mtch2)
            cvar = None
            state = 'WAIT'
        llist.append(ldict)


for i in llist:
    print i

No error checking at all -- and the "state notation" is a bit superficial, but it should do the trick.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow