Question

I have a particular regular expression:

#\b[a-z0-9-_%"]+\b#gi

I have the following test string I am applying that regex filter to:

abc def ghi jkl mno %%car% __car_ tall-person "thing" 20% %30%

However, the detected word boundaries are as follows (square brackets represent boundaries):

[abc] [def] [ghi] [jkl] [mno] %%[car%] [__car_] [tall-person] "[thing"] [20%] %[30%]

So, certain types of punctuation ("_") are recognized at both the beginning and end of the word as "word characters." On the other hand, other types ("%" or double quotes) are ignored when they are at the beginning of the word. Why is this?

Was it helpful?

Solution

In word boundary, word means \w meta character (in most regular expression engine): [A-Za-z0-9_]; %, " is not in that characters: match word boundary.

I think you don't need to use word boundary:

// javascript example
> 'abc def ghi jkl mno %%car% __car_ tall-person "thing" 20% %30%'.match(/[a-z0-9-_%"]+/g)
["abc", "def", "ghi", "jkl", "mno", "%%car%", "__car_", "tall-person", ""thing"", "20%", "%30%"]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top