Question

I'm trying to workout the best to only print the numbered lines. The code is only partially completed as I'm still new to regex in general so may not be using the right method or syntax. Individually the re.matches work fine, it's when I combine them that I get unwanted results:

Sample string:

file = '''
title|Head1|Head2|Head3|head4 
----|------|-----|-----|
1|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
2|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
3|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
4|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
5|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
All|processes:|MemAlloc|=|408125440|(None, None)|0.0.0.0
|(None, None)
0.0.0.0 ,text
''' 
import re
for line in file:
    pat= re.match('(^[A-Z][a-z])|(^--.+)',line) # or use re.match('^[0-9]',line) and match pat != None
    patIP = re.match ('^{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}',line)#
    if patIP == None  or pat == None:
        print(line)

I'm stuck on the logic for printing only the numbered lines,.. I maybe completely off.. Keep in mind I don't want to print the 0.0.0.0(IP addresses) line.

desired output:

1|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
2|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
3|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
4|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
5|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
Was it helpful?

Solution

import io
import re
import sys

file = io.StringIO('''
title|Head1|Head2|Head3|head4 
----|------|-----|-----|
1|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
2|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
3|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
4|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
5|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
All|processes:|MemAlloc|=|408125440|(None, None)|0.0.0.0
|(None, None)
0.0.0.0 ,text
''')

sys.stdout.writelines(line for line in file if re.match('\d+\|', line))

OTHER TIPS

You can try this:

import re

file = '''
title|Head1|Head2|Head3|head4 
----|------|-----|-----|
1|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
2|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
3|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
4|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
5|1150976|0|25300992|bfa92720/bfa924f8|su|(None, None)
All|processes:|MemAlloc|=|408125440|(None, None)|10.93.103.73|(None, None)
0.0.0.0 ,text
''' 

matches = re.findall(r'^\d+\|.*$', file, re.MULTILINE)
for match in matches:
    print match

When you use the multiline mode, ^ and $ stand for begining of the line and end of the line

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top