Domanda

I wish to use a regex in Python that reads text, finds all instances in which < emotion > markup exists within the same sentence as < location > markup, then allows those sentences to be print to a unique line of an output file:

import re
out = open('out.txt', 'w')

readfile = "<location> Oklahoma </location> where the wind comes <emotion> sweeping </emotion> down <location> the plain </location>. And the waving wheat. It can sure smell <emotion> sweet </emotion>." 

for match in re.findall(r'(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\bemotion>(?=\s|\.|$))(?=(?:(?!\.(?:\s|$)).)*?\blocation>(?=\s|\.|$)).*?\.(?=\s|$))', readfile, flags=re.I):
    line = ''.join(str(x) for x in match)
    out.write(line + '\n')

out.close()

The trouble is that if I read in a file that contains line breaks, the regex fails:

import re
out = open('out.txt', 'w')

readfile = "<location> Oklahoma </location> where the wind \n comes <emotion> sweeping </emotion> down <location> the plain </location>. And the waving wheat. It can sure smell <emotion> sweet </emotion>." 

for match in re.findall(r'(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\bemotion>(?=\s|\.|$))(?=(?:(?!\.(?:\s|$)).)*?\blocation>(?=\s|\.|$)).*?\.(?=\s|$))', readfile, flags=re.I):
    line = ''.join(str(x) for x in match)
    out.write(line + '\n')

out.close()

Is there any way to modify this regular expression so that it won't choke when it hits \n? I would be most grateful for any advice others can lend on this question.

È stato utile?

Soluzione

Add re re.S or re.DOTALL (they are the same thing) to the flags in your regex. This will cause . to also match newlines. So the new value for the flags argument would be re.I | re.S.

Altri suggerimenti

Use re.DOTALL / re.S

flags = re.DOTALL | re.I
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top