Domanda

When I am using the code formatted, it works correctly:

Regex

\<\/a\>\ \:\ (.+)\<\/div\>

HTML

<ul>
    <li>
        <div> <a href="#"><strong>1</strong></a> : test1</div>
    </li>
    <li>
        <div> <a href="#"><strong>2</strong></a> : test2</div>
    </li>
    <li>
        <div> <a href="#"><strong>3</strong></a> : test3</div>
    </li>
</ul>

Regular expression visualization

Debuggex Demo

Using preg_match_all with the above, I get:

Array
(
    [0] => test1
    [1] => test2
    [2] => test3
)

But when I use not formatted code, the regex only takes the last instead </div> of create multiple parts when using preg_match_all:

Regex

\<\/a\>\ \:\ (.+)\<\/div\>

HTML

<ul> <li> <div> <a href="#"><strong>1</strong></a> : test1 </div> </li> <li> <div> <a href="#"><strong>2</strong></a> : test2 </div> </li> <li> <div> <a href="#"><strong>3</strong></a> : test3 </div> </li> </ul>

Regular expression visualization

Debuggex Demo

But when using this, I get array:

Array
(
    [0] => test1 </div> </li> <li> <div> <a href="#"><strong>2</strong></a> : test2 </div> </li> <li> <div> <a href="#"><strong>3</strong></a> : test3 
)

How can I fix this?

È stato utile?

Soluzione

By default, the + quantifier is greedy, meaning (loosely) that it will match as much as it can while the regex returns a overall match.

For example, .+</div> will match abc</div>efg in abc</div>efg</div>: each character in the </div> string can be matched by the dot . and the greedy quantifier eats up as much as possible.

What you want to do is either make it lazy, so that it matches the least amount possible, with +?:

</a> : (.+?)</div>

Or, if you know your text can't contain <, use [^<] (ie anything except a <) instead of a .: that way [^<]+ can't eat up </div>:

</a> : ([^<]+)</div>

Your regex was previously working because the dot . by default doesn't match newlines. On a side note, no need to escape everything in your regex...

Altri suggerimenti

Try this way:

<?php

$string = '<ul> <li> <div> <a href="#"><strong>1</strong></a> : test1 </div> </li> <li> <div> <a href="#"><strong>2</strong></a> : test2 </div> </li> <li> <div> <a href="#"><strong>3</strong></a> : test3 </div> </li> </ul>';
$pattern = '#</a>\s*:\s*(.+?)</div>#';
preg_match_all($pattern, $string, $out);

print_r($out);
?>

Result:

Array
(
    [0] => Array
        (
            [0] =>  : test1 
            [1] =>  : test2 
            [2] =>  : test3 
        )

    [1] => Array
        (
            [0] => test1 
            [1] => test2 
            [2] => test3 
        )

)

The white space might be changed (space or tab) therefore, its better to use \s to match all white spaces even (\n or \r)

</a>\s?+:\s?+(.*?)\s?+</div>

Regular expression visualization

Debuggex Demo

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top