Pyparsing: Detect tokens with a specific ending

https://stackoverflow.com/questions/16277839

13-04-2022
|

Question

I wonder what I am doing wrong here. Maybe someone can give me a hint on this problem. I want to detect certain tokens using pyparsing that terminate with the string _Init.

As an example, I have the following lines stored in text

one
two_Init
threeInit
four_foo_Init
five_foo_bar_Init

I want to extract the following lines:

two_Init
four_foo_Init
five_foo_bar_Init

Currently, I have reduced my problem to the following lines:

    import pyparsing as pp

    ident = pp.Word(pp.alphas, pp.alphanums + "_")
    ident_init = pp.Combine(ident + pp.Literal("_Init"))

    for detected, s, e in ident_init.scanString(text): 
        print detected

Using this code there are no results. If I remove the "_" in the Word statement then I can detect at least the lines having a _Init at their ends. But the result isnt complete:

['two_Init']
['foo_Init']
['bar_Init']

Has someone any ideas what I am doing completely wrong here?

Solution

The problem is that you want to accept '_' as long as it is not the '_' in the terminating '_Init'. Here are two pyparsing solutions, one is more "pure" pyparsing, the other just says the heck with it and uses an embedded regex.

samples = """\
one
two_Init
threeInit
four_foo_Init
six_seven_Init_eight_Init
five_foo_bar_Init"""


from pyparsing import Combine, OneOrMore, Word, alphas, alphanums, Literal, WordEnd, Regex

# implement explicit lookahead: allow '_' as part of your Combined OneOrMore, 
# as long as it is not followed by "Init" and the end of the word
option1 = Combine(OneOrMore(Word(alphas,alphanums) | 
                            '_' + ~(Literal("Init")+WordEnd())) 
                  + "_Init")

# sometimes regular expressions and their implicit lookahead/backtracking do 
# make things easier
option2 = Regex(r'\b[a-zA-Z_][a-zA-Z0-9_]*_Init\b')

for expr in (option1, option2):
    print '\n'.join(t[0] for t in expr.searchString(samples))
    print

Both options print:

two_Init
four_foo_Init
six_seven_Init_eight_Init
five_foo_bar_Init

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow