문제

I have a BLAST output in default format. I want to parse and extract only the info I need using regex. However, in the line below

Query= contig1

There is a space there between '=' and 'contig1'. So in my output it prints a space in front. How to avoid this? Below is a piece of my code,

import re
output = open('out.txt','w')
with open('in','r') as f:
    for line in f:
        if re.search('Query=\s', line) != None:
            line = line.strip()
            line = line.rstrip()
            line = line.strip('Query=\s')
            line = line.rstrip('\s/')
            query = line
            print >> output,query
output.close()

Output should look like this,

contig1
도움이 되었습니까?

해결책

You could actually use the returned match to extract the value you want:

for line in f:
    match = re.search('Query=\s?(.*)', line)
    if match is not None:
        query = match.groups()[0]
        print >> output,query

What we do here is: we search for a Query= followed (or not) by a space character and extract any other characters (with match.groups()[0], because we have only one group in the regular expression).

Also depending on the data nature you might want to do only simple string prefix matching like in the following example:

output = open('out.txt','w')
with open('in.txt','r') as f:
    for line in f:
        if line.startswith('Query='):
            query = line.replace('Query=', '').strip()
            print >> output,query
output.close()

In this case you don't need the re module at all.

다른 팁

If you are just looking for lines like tag=value, do you need regex?

tag,value=line.split('=')
if tag == 'Query':
   print value.strip()
a='Query= conguie'

print "".join(a.split('Query='))

#output conguie

Comma in print statement adds space between parameters. Change

print output,query

to

print "%s%s"%(output,query)

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top