Preg_match for different language URLs

https://stackoverflow.com/questions/23221483

07-07-2023
|

Domanda

I have some text like this :

$text = "Some thing is there http://example.com/جميع-وظائف-فى-السليمانية 
         http://www.example.com/جميع-وظائف-فى-السليمانية nothing is there
         Check me http://example.com/test/for_me first
         testing http://www.example.com/test/for_me the url 
         Should be test http://www.example.com/翻译-英语教师-中文教师-外贸跟单
         simple text";

I need to preg_match the URL, but they are of different languages.
So, I need to get the URL itself, from each line.

I was doing like this :

$text = preg_replace("/[\n]/", " <br>", $text);
$lines = explode("<br>", $text);
foreach($line as $textLine){
   if (preg_match("/(http\:\/\/(.*))/", $textLine, $match )) {
     // some code
     // Here I need the url
   }
}

My current regex is /(http\:\/\/(.*))/, please suggest how I can make this compatible with the URLs in different languages?

Soluzione

A regular expression like this may work for you?
In my test it worked with the text example you gave however it is not very advanced. It will simple select all characters after http:// or https:// until a white-space character occures (space, new line, tab, etc).

/(https?\:\/\/(?:[^\s]+))/gi

Regular expression visualization

Here is a visual example of what would be matched from your sample string:
http://regex101.com/r/bR0yE9

Altri suggerimenti

You don't need to work line by line, you can search directly:

if (preg_match_all('~\bhttp://\S+~', $text, $matches))
     print_r($matches);

Where \S means "all that is not a white character".
There is no special internalisation problem.

Note: if you want to replace all newlines after with <br/>, I suggest to use $text = preg_replace('~\R~', '<br/>', $text);, because \R handles several type of newlines when \n will match only unix newlines.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow