Question

I have big problems understanding how to use preg_replace with backreferencing.

I have a plain-text string and want to replace every link with the HTML syntax for a link. So "www.mydomain.tld" or "http://www.mydomain.tld" or "http://mydomain.tld" should be wrapped in an HTML a-tag. I have found a working function that does this online, but I want to understand how to do it myself.

In the function I found, this is the replacement:

"\\1<a href=\"http://\\2\" target=\"_blank\" rel=\"nofollow\">\\2</a>"

I see some escaped quotation marks in there and these bits: \\1 \\2.
According to the PHP documentation these are backreferences. But how do I use them, what do they do?

I found nothing about that in the spec, so any help would be greatly appreciated!

Was it helpful?

Solution

This will do the job for you. Please see below for an explanation on how it all works.

$string = 'some text www.example.com more text http://example.com more text https://www.example.com more text';

$string = preg_replace('#\b(?:http(s?)://)?((?:[a-z\d-]+\.)+[a-z]+)\b#', "<a href='http$1://$2'>http$1://$2</a>", $string);

echo $string; // some text <a href='http://www.example.com'>http://www.example.com</a> more text <a href='http://example.com'>http://example.com</a> more text <a href='https://www.example.com'>https://www.example.com</a> more text

\b match word boundary (?:http(s?)://)? optionally match string if it contains 'http://' or 'https://', if https grab the 's' so we can build correct URL

(?:[a-z\d-]+\.)+ match one or more occurrence of of series of letter/numbers followed by a period

[a-z]+ match one ore more occurrences of a series of letters, TLD, note TLDs are now open for purchase so can't limit length anymore. see http://tinyurl.com/cle6jqb

We then capture both of the last two sections in addition to the 's' in a backreference by enclosing them in parentheses.

We then build the URL:

<a href='http$1://$2'>http$1://$2</a>

http$1:// create HTTP if HTTPS the backreference $1 will contain an 's'

$2 will contain the domain name. We make the link where the URL is made the link text.

OTHER TIPS

You might want to use something similar to this:

$string = preg_replace('/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/ ', "<a href=\"$1\">Link</a>", $yourtext);

Some usefull links:
Try Regex with this tool: click
Regex from: Nettuts
Named backreferences: Click

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top