I'm trying to linkify a twitter post. But hashtags that look like "#löövet" doesn't get filter as I want them to. They get cut off before the foreign characters. The foreign characters should be allowed.

Anyone know how to alter the regex for this purpose?

Below is my example:

//Hashtag
$tweet = preg_replace("/ +#([a-z0-9_]*)?/i", " <a href=\"http://twitter.com/tag/\\1\" target=\"_blank\">#\\1</a>", $tweet);



//Problem: 
/*
* The function above does not match foreign characters as å/ä/ö
* Tag result example: tag = #löövet
* After preg_replace: tag = #l öövet
* Desired after preg_replace: tag = #löövet
*/   
有帮助吗?

解决方案

How about:

$tweet = preg_replace("/ +#(\p{Xwd}*)/u", " <a href=\"http://twitter.com/tag/$1\" target=\"_blank\">#$1</a>", $tweet);

\p{Xwd} has the same meaning that \w with all unicode letters and number and underscores.

If you don't want underscore, use \p{Xan}.

其他提示

use \p{L} instead of a-z to match all unicode letters (or \p{L}\p{N} with numbers)

$tweet = preg_replace("/ +#([\p{L}\p{N}_]*)?/i", " <a href=\"http://twitter.com/tag/\\1\" target=\"_blank\">#\\1</a>", $tweet);

to find more about unicode in regexp look here

Instead of running behind the unicode, you can try this one if your hashtags do not contains any space.

/ +#(\S+)/

If you want to limit allowed letters to latin letters, you can use:

$tweet = preg_replace('/ +#([\p{Latin}0-9_]*)/u', ' <a href="http://twitter.com/tag/$1" target="_blank">#$1</a>", $tweet);
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top