Question

I'm trying to find the correct Regular Expression to match all RT scenarios on Twitter (can't wait to Twitter's new retweet API). The way I see it, RT's can be at the beginning, middle, or end of the string returned from Twitter. So, I need something at the beginning and end of this Regular Expression:

([Rr])([Tt])

No matter what I try, I cannot match all scenarios in one Regular Expression.
I tried

[^|\s+]

to match the scenario where the RT will appear either at the beginning of the string or after one or more whitespace characters, but it didn't work the same for the end of the string or RT.
I tried

[\s+|$]

to match a case when the RT appear either in the end of the string or there's one or more whitespace characters following it, same as with the 'pre' -- it didn't work.

Can someone please explain what am I doing wrong here? Any help or suggestions will be highly appreciated (as always :) )

Was it helpful?

Solution

You'll probably be happiest with something like:

/\brt\b/i

Which will find isolated instances of RT (that is, surrounded by word-boundaries), and use the /i modifier at the end of the regex to make it case-insensitive.

You want the word boundaries so that you don't end up thinking random tweets containing words like "Art" and "Quartz" are actually retweets. Even then, it's going to have false positives.

By default, a regular expression can (and will) match anywhere inside a string, so you don't need to account for what may precede or follow your match if indeed you don't care what it is or if it is present.

OTHER TIPS

if(preg_match('/\brt\s*@(\w+)/i', $tweet, $match))
    echo 'Somebody retweeted ' . $match[1] . "\n";
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top