I don't think that regex would help you here for a general case. for your examples, this regex would work as you want it to:
((?<=[^\(\)].{3})\bon the\b(?=.{3}[^\(\)])
description:
(?<=[^\(\)].{3}) Positive Lookbehind - Assert that the regex below
can be matched
[^\(\)] match a single character not present in the list below
\( matches the character ( literally
\) matches the character ) literally
.{3} matches any character (except newline)
Quantifier: Exactly 3 times
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
on the matches the characters on the literally (case sensitive)
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
(?=.{3}[^\(\)]) Positive Lookahead - Assert that the regex below
can be matched
.{3} matches any character (except newline)
Quantifier: Exactly 2 times
[^\(\)] match a single character not present in the list below
\( matches the character ( literally
\) matches the character ) literally
if you want to generalize the problem to any string between the parentheses and the string you are searching for, this will not work with this regex. the issue is the length of that string between parentheses and your string. In regex the Lookbehind quantifiers are not allowed to be indefinite.
In my regex I used positive Lookahead and positive Lookbehind, the same result could be achieved as well with negative ones, but the issue remains.
Suggestion: write a small python code which can check a whole line if it contain your text not between parentheses, as regex alone can't do the job.
example:
import re
mystr = 'on the'
unWanted = re.findall(r'\(.*'+mystr+'.*\)|\)'+mystr, data) # <- here you put the un-wanted string series, which is easy to define with regex
# delete un-wanted strings
for line in mylist:
for item in unWanted:
if item in line:
mylist.remove(line)
# look for what you want
for line in mylist:
if mystr in line:
print line
where:
mylist: a list contains all the lines you want to search through.
mystr: the string you want to find.
Hope this helped.