Question

I am trying to write my own syntax highlighter in sublime. I think it uses python-based regular expression. Just want to match all tokens in a row like:

description str.bla, str.blub, str.yeah, str.no

My regular expression looks like:

regex = "(description) (str\\.[\\w\\d]+)(,\\s*(str\\.[\\w\\d]+))*"

Now I expect 1 matches in group 1 ("description"), 1 match in group 2 ("str.bla") and 3 matches in my group no 4 ("str.blub", "str.yeah", "str.no")

but I have only 1 match in my last group ("str.no"). What's going on there?

Thanks a lot!

Was it helpful?

Solution

When you have a repeated capture group, (e.g. (a)* or (a)+, etc), the capture group will contain only the last match.

So, if I have the regex:

(123\d)+

And the string:

123412351236

You will find that the capture group will contain only 1236.

I don't know any way around this (besides hard coding the number of subgroups to capture), but you can try capturing the whole group like so:

regex = "(description) (str\\.[\\w\\d]+)((?:,\\s*(?:str\\.[\\w\\d]+))*)"

Which should give you

['description', 'str.bla', ', str.blub, str.yeah, str.no']

Note how the elements are grouped; you have 3 items in the list, the last one being a 'list' within the larger list.

OTHER TIPS

Try this:

regex = "(description) (str\\.[\\w\\d]+)((?:,\\s*(?:str\\.[\\w\\d]+))*)"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top