Question

I wrote a code to remove all parentheses in a txt file, and the text between them, as well as multiple whitespace.

However, I have very little experience with Python, and it's quite obvious that my code is inefficient.

What's the best way to do what I want?

import re

lines = open('test.txt', 'r+')
lines = [re.sub('\s+',' ', line) for line in lines] #this is to kill 'tab' whitespaces
lines = [re.sub(' +',' ', line) for line in lines] #regular whitespace, if more than 1
lines = [re.sub('\(.*?\)','', line) for line in lines] #brackets and the text
with open('test2.txt', 'w') as out:
    out.writelines(lines)
Was it helpful?

Solution

If you have enough lines to offset the cost of compiling the regexes, something like the following should serve.

#!/usr/bin/env python

import re

if __name__ == "__main__":
    lines = {' foo      (bar)    '}
    parens_regex = re.compile(r'\(.*?\)')  # Non-greedy
    space_regex =  re.compile(r'\s+')

    for line in lines:
        print 'Before: "%s"' % line
        line_tmp = parens_regex.sub('', line)  # Before space-regex so we also collapse space around parens
        line_tmp = space_regex.sub(' ', line_tmp)
        line_tmp = line_tmp.strip()
        print 'After: "%s"' % line_tmp  # Prints: "foo"

I guess it's questionable whether that's more elegant - probably not.
You already knew enough about regexes to make your parens regex non-greedy.
But maybe a future Stack Overflow reader doesn't. Or maybe they or you didn't know about compiling regexes...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top