Вопрос

I’m trying to create a regex for form validation but it always returns true. The user must be able to add something like {user|2|S} as input but also use brackets if they are escaped with \.

This code checks for the left bracket { for now.

$regex = '/({(?=([a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)}))|[^{]|(?<=\\\){)*/';
if (preg_match($regex, $value)) {
     return TRUE;
} else {
    return FALSE;
}

A possible correct input would be:

Hello {user|1|S}, you have {amount|2|D2}

or

Hello {user|1|S}, you have {amount|2|D2} in \{the_bracket_bank\}

However, this should return false:

Hello {user|1|S}, you have {amount|2}

and this also:

Hello {user|1|S}, you have {amount|2|D2} in {the_bracket_bank}

A live example can be found here: http://regexr.com?37tpu Note that there is a \ in the lookbehind at the end, PHP was giving me error messages because I had to escape it an extra time in my code.

Это было полезно?

Решение

The main error is that you do not specify that the regex should match from the beginning to the of the checked string. Use the ^ and $ assertions.

I think you have to escape { and } in your regex as they have special meaning. Together they form a quantifier.

The (?<=\\\) is better written (?<=\\\\). The backslash has to be double escaped as it has special meaning in both single-quoted string and PCRE regex. Using \\\ works too, because if single-quoted string contains any escape sequence except \\ and \', it handles it as literal backslash and letter, therefore \) is taken literally. But explicitly escaping the backslash twice seems easier to read to me.

The regex should be

$regex = '/^(\{(?=([a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)\}))|[^{]|(?<=\\\\)\{)*$/';

But notice that the look-around assertions are not necessary. This regex should do the job too:

$regex = '/^([^{]|\\\{|\{[a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)\})*$/';

Any non-{ characters are matched by the first alternative. When a { is read, one of the remaining two alternatives is used. Either the pattern for the brace thing matches, or the regex engine backtracks one character and tries to match \{ character sequence. If it fails, both ways, it backtracks further till it reaches string start and fails completely.

Другие советы

Matching without lookbehind

You can make a regex for this without using lookbehind/lookaheads (which is usually recommended).

For example, if your requirement is that you can match any character but a { and a } unless it's preceded by a \. You can also say:

Match any character but a { and a } OR match a \{ or a \}. To match any character but a { and a } use:

[^{}]

To match a \{ use:

\\\{

One backslash is for escaping the { (which might not be necessary, depending on your regex compiler) and one backslash is for escaping the other backslash.

You would end up with this:

(?:
    [^{}]
|
    \\\{
|
    \\\}
)+

I nicely formatted this regex so that it's readable. If you want to use it in your code like this make sure to use the [PCRE_EXTENDED][1] modifier.

Looks more of a job for a lookbehind to me:

/((?<!\\\\)\{[a-zA-Z0-9]+\|[0-9]+\|[SD][0-9]*\})/

However, the obfuscation factor is so high that I would rather recognize all bracketed strings and parse them later.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top