Question

I'm trying to linkify a twitter post. But hashtags that look like "#löövet" doesn't get filter as I want them to. They get cut off before the foreign characters. The foreign characters should be allowed.

Anyone know how to alter the regex for this purpose?

Below is my example:

//Hashtag
$tweet = preg_replace("/ +#([a-z0-9_]*)?/i", " <a href=\"http://twitter.com/tag/\\1\" target=\"_blank\">#\\1</a>", $tweet);



//Problem: 
/*
* The function above does not match foreign characters as å/ä/ö
* Tag result example: tag = #löövet
* After preg_replace: tag = #l öövet
* Desired after preg_replace: tag = #löövet
*/   
Was it helpful?

Solution

How about:

$tweet = preg_replace("/ +#(\p{Xwd}*)/u", " <a href=\"http://twitter.com/tag/$1\" target=\"_blank\">#$1</a>", $tweet);

\p{Xwd} has the same meaning that \w with all unicode letters and number and underscores.

If you don't want underscore, use \p{Xan}.

OTHER TIPS

use \p{L} instead of a-z to match all unicode letters (or \p{L}\p{N} with numbers)

$tweet = preg_replace("/ +#([\p{L}\p{N}_]*)?/i", " <a href=\"http://twitter.com/tag/\\1\" target=\"_blank\">#\\1</a>", $tweet);

to find more about unicode in regexp look here

Instead of running behind the unicode, you can try this one if your hashtags do not contains any space.

/ +#(\S+)/

If you want to limit allowed letters to latin letters, you can use:

$tweet = preg_replace('/ +#([\p{Latin}0-9_]*)/u', ' <a href="http://twitter.com/tag/$1" target="_blank">#$1</a>", $tweet);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top