Вопрос

I have a simple scenario where I want to match the follow and capture the value:

stuff_in_string,
env: 'local', // want to match this and capture the content in quotes
more_stuff_in_string

I have never written a regex pattern before so excuse my attempt, I am well aware it is totally wrong.

This is what I am trying to say:

  • Match "env:"
  • Followed by none or more spaces
  • Followed by a single or double quote
  • Capture all until..
  • The next single or double quote

/env:*?\s+('|")+(.*?)+('|")/g

Thanks

PS here is a #failed fiddle: http://jsfiddle.net/DfHge/

Note: this is the regex I ended up using (not the answer below as it was overkill for my needs): /env:\s+(?:"|')(\w+)(?:"|')/

Это было полезно?

Решение

You can use this:

/\benv: (["'])([^"']*)\1/g

where \1 is a backreference to the first capturing group, thus your content is in the second. This is the simple way for simple cases.

Now, other cases like:

env: "abc\"def"
env: "abc\\"
env: "abc\\\def"
env: "abc'def"

You must use a more constraining pattern:

first: avoid the different quotes problem:

/\benv: (["'])((?:[^"']+|(?!\1)["'])*)\1/g

I put all the possible content in a non capturing group that i can repeat at will, and I use a negative lookahead (?!\1) to check if the allowed quote is not the same as the captured quote.

second: the backslash problem:

If a quote is escaped, it can't be the closing quote! Thus you must check if the quote is escaped or not and allow escaped quotes in the string.

I remove the backslashes from allowed content:

/\benv: (["'])((?:[^"'\\]+|(?!\1)["'])*)\1/g

I allow escaped characters:

/\benv: (["'])((?:[^"'\\]+|(?!\1)["']|\\[\s\S])*)\1/g

To allow a variable number of spaces before the quoted part, you can replace : by :\s*

/\benv:\s*(["'])((?:[^"'\\]+|(?!\1)["']|\\[\s\S])*)\1/g

You have now a working pattern.

third: pattern optimization

a simple alternation:

Using a capture group and a backreferences can be seducing to deal with the different type of quotes since it allow to write the pattern in a concise way. However, this way needs to create a capture group and to test a lookahead in this part (?!\1)["']`, so it is not so efficient. Writing a simple alternation increases the pattern length and needs to use two captures groups for the two cases but is more efficient:

/\benv:\s*(?:"((?:[^"\\]+|\\[\s\S])*)"|'((?:[^'\\]+|\\[\s\S])*)')/g

(note: if you decided to do that you must check which one of the two capture groups is defined.)

unrolling the loop:

To match the content inside quotes we use (?:[^"\\]+|\\[\s\S])* (for double quotes here) that works but can be improved to reduce the amount of steps needed. To do that we will unroll the loop that consists to avoid the alternation:

[^"\\]*(?:\\[\s\S][^"\\]*)*

finally the whole pattern can be written like this:

/\benv:\s*(?:"([^"\\]*(?:\\[\s\S][^"\\]*)*)"|'([^'\\]*(?:\\[\s\S][^'\\]*)*)')/g

Другие советы

env *('|").*?\1 is what you're looking for

the * means none or more

('|") matches either a single or double quote, and also saves it into a group for backreferencing

.*? is a reluctant greedy match all

\1 will reference the first group, which was either a single or double quote

regex=/env: ?['"]([^'"])+['"]/
answer=str.match(regex)[1]

even better:

regex=/env: ?(['"])([^\1]*)\1/
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top