Pergunta

I have a particular regular expression:

#\b[a-z0-9-_%"]+\b#gi

I have the following test string I am applying that regex filter to:

abc def ghi jkl mno %%car% __car_ tall-person "thing" 20% %30%

However, the detected word boundaries are as follows (square brackets represent boundaries):

[abc] [def] [ghi] [jkl] [mno] %%[car%] [__car_] [tall-person] "[thing"] [20%] %[30%]

So, certain types of punctuation ("_") are recognized at both the beginning and end of the word as "word characters." On the other hand, other types ("%" or double quotes) are ignored when they are at the beginning of the word. Why is this?

Foi útil?

Solução

In word boundary, word means \w meta character (in most regular expression engine): [A-Za-z0-9_]; %, " is not in that characters: match word boundary.

I think you don't need to use word boundary:

// javascript example
> 'abc def ghi jkl mno %%car% __car_ tall-person "thing" 20% %30%'.match(/[a-z0-9-_%"]+/g)
["abc", "def", "ghi", "jkl", "mno", "%%car%", "__car_", "tall-person", ""thing"", "20%", "%30%"]
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top