scanString end location: why it is end_index+1?
-
11-09-2019 - |
Question
python/pyparsing
When I use scanString method, it is giving the start and end location of the matched token, in the text.
e.g.
line = "cat bat"
pat = Word(alphas)
for i in pat.scanString(line):
print i
I get the following:
((['cat'], {}), 0, 3)
((['bat'], {}), 4, 7)
But cat end location should be "2" right? Why it is reporting the next location as the end location?
Solution
This is consistent with Python's [begin:end]
slicing conventions, where the "end" is the index of the next character. By putting the end as the next location, it is very straightforward to extract the matching substring using the returned values:
for t,start,end in pat.scanString(line):
print line[start:end]
You can see how this is used if you look in the pyparsing source code for the implementation of transformString
.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow