The problem is that you want to accept '_
' as long as it is not the '_
' in the terminating '_Init
'. Here are two pyparsing solutions, one is more "pure" pyparsing, the other just says the heck with it and uses an embedded regex.
samples = """\
one
two_Init
threeInit
four_foo_Init
six_seven_Init_eight_Init
five_foo_bar_Init"""
from pyparsing import Combine, OneOrMore, Word, alphas, alphanums, Literal, WordEnd, Regex
# implement explicit lookahead: allow '_' as part of your Combined OneOrMore,
# as long as it is not followed by "Init" and the end of the word
option1 = Combine(OneOrMore(Word(alphas,alphanums) |
'_' + ~(Literal("Init")+WordEnd()))
+ "_Init")
# sometimes regular expressions and their implicit lookahead/backtracking do
# make things easier
option2 = Regex(r'\b[a-zA-Z_][a-zA-Z0-9_]*_Init\b')
for expr in (option1, option2):
print '\n'.join(t[0] for t in expr.searchString(samples))
print
Both options print:
two_Init
four_foo_Init
six_seven_Init_eight_Init
five_foo_bar_Init