Your problem appears to be that your regex is expecting a space (\s
) to follow the matching word, as seen with:
emotion(?=\s|\.|$)
Since when it's part of a tag, it's followed by a >
, rather than a space, no match is found since that lookahead fails. To fix it, you can just add the >
after emotion, like:
for match in re.findall(r'(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\bemotion>(?=\s|\.|$))(?=(?:(?!\.(?:\s|$)).)*?\bomaha(?=\s|\.|$)).*?\.(?=\s|$))', text, flags=re.I):
line = ''.join(str(x) for x in match)
Upon testing, this seems to solve your problem. Make sure and treat "LOCATION" similarly:
for match in re.findall(r'(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\bemotion>(?=\s|\.|$))(?=(?:(?!\.(?:\s|$)).)*?\bLOCATION>(?=\s|\.|$)).*?\.(?=\s|$))', text, flags=re.I):
line = ''.join(str(x) for x in match)