scanString end location: why it is end_index+1?

https://stackoverflow.com/questions/1834456

pyparsing

11-09-2019
|

Question

python/pyparsing

When I use scanString method, it is giving the start and end location of the matched token, in the text.

e.g.

line = "cat bat"
pat = Word(alphas)
for i in pat.scanString(line):
    print i

I get the following:

((['cat'], {}), 0, 3)
((['bat'], {}), 4, 7)

But cat end location should be "2" right? Why it is reporting the next location as the end location?

Solution

This is consistent with Python's [begin:end] slicing conventions, where the "end" is the index of the next character. By putting the end as the next location, it is very straightforward to extract the matching substring using the returned values:

for t,start,end in pat.scanString(line):
    print line[start:end]

You can see how this is used if you look in the pyparsing source code for the implementation of transformString.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow