I wish to use a regex in Python that reads text, finds all instances in which < emotion > markup exists within the same sentence as < location > markup, then allows those sentences to be print to a unique line of an output file:

import re
out = open('out.txt', 'w')

readfile = "<location> Oklahoma </location> where the wind comes <emotion> sweeping </emotion> down <location> the plain </location>. And the waving wheat. It can sure smell <emotion> sweet </emotion>." 

for match in re.findall(r'(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\bemotion>(?=\s|\.|$))(?=(?:(?!\.(?:\s|$)).)*?\blocation>(?=\s|\.|$)).*?\.(?=\s|$))', readfile, flags=re.I):
    line = ''.join(str(x) for x in match)
    out.write(line + '\n')

out.close()

The trouble is that if I read in a file that contains line breaks, the regex fails:

import re
out = open('out.txt', 'w')

readfile = "<location> Oklahoma </location> where the wind \n comes <emotion> sweeping </emotion> down <location> the plain </location>. And the waving wheat. It can sure smell <emotion> sweet </emotion>." 

for match in re.findall(r'(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\bemotion>(?=\s|\.|$))(?=(?:(?!\.(?:\s|$)).)*?\blocation>(?=\s|\.|$)).*?\.(?=\s|$))', readfile, flags=re.I):
    line = ''.join(str(x) for x in match)
    out.write(line + '\n')

out.close()

Is there any way to modify this regular expression so that it won't choke when it hits \n? I would be most grateful for any advice others can lend on this question.

有帮助吗?

解决方案

Add re re.S or re.DOTALL (they are the same thing) to the flags in your regex. This will cause . to also match newlines. So the new value for the flags argument would be re.I | re.S.

其他提示

Use re.DOTALL / re.S

flags = re.DOTALL | re.I
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top