Question

I am using PHP and I am looking to create links within my text to other sections of the site so for example:

I fell into the media industry aged 30, when David Mansfield, now on the board of
Ingenious Media, gave me my first break at Thames TV. From there, I worked at the
(now-defunct) Sunday Correspondent and IPC, before joining TDI, which became Viacom
and then CBS Outdoor. After 12 years in outdoor, I spent a year out doing overseas
outdoor consultancy work in Russia, Dubai and Spain, as well as launching the media 
CRM business, Media By Permission. I have been lucky enough to work across a range of 
media, but outdoor would definitely be my specialist subject on 'Mastermind'.

I would want to link Ingenious Media To a page all about Ingenious Media but I would also like to link all mentions of Media to a media related page.

Obviously I don't want to link the word Media inside Ingenious Media

How could I go about doing this without double linking some words?

Thanks in advance

Was it helpful?

Solution

Step 1. Create a new array containing the names of the entities you want to 'tag' and order it longest entity name to shortest entity name.

Step 2. Loop through this array and replace each occurance of the entity in the text with a unique token (for example ## . rand(100, 999) * rand(100, 999)). We do this to avoid creating links around entities that form part of another entity.

Step 3. Create your link and store it in another array where the key for each entry in the array is the unique token and the value is the link you just made.

Step 4. Loop through the links array and replace the tokens in the text with the links that correspond to the token in the array.

OTHER TIPS

I'm not sure if this is possible with regexp. I would do something like this:

  1. search for phrase
  2. check if the phrase is inside link (search to right for tag a if it is begining tag than you are probably not inside and if it is eding tag you are inside)
  3. if you are not inside replace

Maybe if you use greedy regular expressions to match as much as possible from a phase. look at those links http://www.exampledepot.com/egs/java.util.regex/Greedy.html and http://www.regular-expressions.info/repeat.html

$string = '...your string from above....';

// Here we replace only "Media" when there is no "Ingenious " in front of it.
$string = preg_replace('#(?<!Ingenious )Media#', '<a href="media.html">Media</a>', $string);

// Here don't need to use a regex...
$string = str_replace('Ingenious Media', '<a href="ingenious_media.html">Ingenious Media</a>', $string);
echo $string;

I'm sure, that there is a better regex, 'cause there always is ;) but this way it works, just tested it :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top