Question

I have the following code:

$string = "Manual balls knit cardigan @120rb

ORDER
BB 28AFF6A6 atau 25AE5DB3 
Phone 081298249949 atau 081310570229 
Line indy2212 atau indy2281 
FORMAT
Nama 
Alamat 
Telp 
Kode barang";

if (preg_match('/(?<= )@([^@ ]+)/', $string, $matches)) {
    var_dump(count($matches));
    var_dump('first ' . $matches[0]);
    var_dump('second ' . $matches[1]);
}

However this results in $matches to return an array of count 2. With the following string:

2
@120rb ORDER BB
120rb ORDER BB

My question is why? Why does it match the string twice? What is wrong with my regex

Was it helpful?

Solution

preg_match() stores the matches into an array which you supply as the third parameter. In this case your preg_match() statement looks like:

preg_match('/(?<= )@([^@ ]+)/', $string, $matches);

So $matches contain all the matches, where:

  • $matches[0] will contain the text that matched the full pattern
  • $matches[1] will have the text matched by the first capturing group
  • $matches[2] will have the text matched by the second capturing group
  • and so on...

The regular expression here is (?<= )@([^@ ]+). It matches @120rb ORDER BB completely, so it will be stored in $matches[0], whereas the capturing group ([^@ ]+) will only capture the part after the @ (120rb ORDER BB) and it will be stored in $matches[1].

Currently, the regular expression doesn't detect if a mention is at the beginning of the string. Also, it'd incorrectly match whitespace on the next line as [^@] will match anything that's not a @ symbol. I'd use the following expression with preg_match_all():

(?<=^|\s)@([^@\s]+)

Code:

if (preg_match_all('/(?<=^|\s)@([^@\s]+)/', $string, $matches)) {
    print_r($matches[1]);
}

To get the number of matches, you can just use echo count($matches[0]);.

Demo

OTHER TIPS

Both preg_match() and preg_match_all() allow the assignment of a reference variable as their third parameter. If you provide the variable, the default behavior will be to put fullstring match(es) in its first element.

When you only want to extract last portion of your pattern's fullstring match, you can use \K to discard the leading/unwanted characters. This avoids the expense of a lookbehind.

Both preg_match() and preg_match_all() provide the number of fullstring matches found as their return value. This means that it is never necessary to call count() on the matches array.

My pattern below will match the starting position of the string (^) or a whitespace character (\s), then match a literal @ symbol, then forget these matched characters, then match one or more "word characters" which consist of letters, numbers, and underscores. This pattern should eliminate false matches like email addresses and non-mentions.

If you need to ensure that the mention is not immediately followed by invalid characters, you can write a lookahead at the end of the pattern to require the end position of the string or a whitespace character ((?=$|\s)).

Code: (Demo)

$string = '@mention_1 @$badmention Manual balls knit cardigan @120rb
email me @ example@example.com';

$count = preg_match_all(
    '/(?:^|\s)@\K\w+/',
    $string,
    $matches
);

var_export([
    'count' => $count,
    'matches' => $matches[0]
]);

Output:

array (
  'count' => 2,
  'matches' => 
  array (
    0 => 'mention_1',
    1 => '120rb',
  ),
)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top