Question

So the regex for a quoted string has been solved over and over. A good answer seen here: https://stackoverflow.com/a/5696141/692331

$re_dq = '/"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s';

Seems to be the standard solution for PHP.

My Issue is that my quotes are escaped by another quote. Example:

="123 4556 789 ""Product B v.24"""

="00 00 F0 FF ""Licence key for blah blah"" hfd.34"

=""

The previous strings should match the following, respectively:

string '123 4556 789 ""Product B v.24""' (length=31) 

string '00 00 F0 FF ""Licence key for blah blah"" hfd.34' (length=48) 

string '' (length=0) 

The examples given are just illustrations of what the string may look like and are not the actual strings I will be matching, which can number in the tens of thousands.

I need a regex pattern that will match a double quoted string which may OR MAY NOT contain sequences of two double quotes.

UPDATE 5/5/14:

See Answer Below

Was it helpful?

Solution 2

I found that the pattern from zx81

$re_dq_answer = '/="(?:[^"]|"")*"/'

results in backtracking after every single matched character. I found that I could adapt the pattern found at the very top of my question to suit my need.

$re_dq_orignal = '/="[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s';

becomes

$re_dq_modified = '/="([^"]*(?:""[^"]*)*)"/';

The 's' pattern modifier isn't necessary because the pattern does not using the \s metacharacter.

The longest string I have had to match was 28,000 characters which caused Apache to crash on a stackoverflow. I had to increase the stack size to 32MB (linux default is 8mb, windows is 1mb) just to get by! I didn't want every thread to have this large stack size, so I started looking for a better solution.

Example (tested on Regex101): A string (length=3,200) which required 6,637 steps to match using $re_dq_answer now requires 141 steps using $re_dq_modified. Slight improvement I'd say!

OTHER TIPS

Edit: Per your request, minor mod to account for empty quotes.

(?<!")"(?:[^"]|"")*"

Original solution:

(?<!")"(?:[^"]|"")+"

Demo:

<?php
$string = '
"123 4556 789 ""Product B v.24"""
"00 00 F0 FF ""Licence key for blah blah"" hfd.34"';
$regex='~(?<!")"(?:[^"]|"")+"~';
$count = preg_match_all($regex,$string,$m);
echo $count."<br /><pre>";
print_r($m[0]);
echo "</pre>";
?>

Output:

2

Array
(
    [0] => "123 4556 789 ""Product B v.24"""
    [1] => "00 00 F0 FF ""Licence key for blah blah"" hfd.34"
)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top