Question

For the sake of brevity...
I want to take items out of a string, put them into a separate array, replace the values extracted from the string with ID'd tokens, parse the string, then put the extracted items back in their original positions (in the correct order). (If that makes sense, then skip the rest :D)

I have the following string;
"my sentence contains URLs to [url] and [url] which makes my life difficult."

For various reasons, I would like to remove the URLs. But I need to keep their place, and reinsert them later (after manipulating the rest of the string).

Thus I would like;
"my sentence contains URLs to [url] and [url] which makes my life difficult."
to become;
"my sentence contains URLs to [token1fortheURL] and [token2fortheURL] which makes my life difficult."

I've tried doing this several times, various ways. All I do is hit brickwalls and invent new swear words!

I use the following code to setup with;

$mystring = 'my sentence contains URLs to http://www.google.com/this.html and http://www.yahoo.com which makes my life difficult.';
$myregex = '/(((?:https?|ftps?)\:\/\/)?([a-zA-Z0-9:]*[@])?([a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}|([0-9]+))([a-zA-Z0-9-._?,\'\/\+&%\$#\=~:]+)?)/';
$myextractions = array();

I then do a preg_replace_callback;

$matches = preg_replace_callback($myregex,'myfunction',$mystring);

And I have my function as follows;

function myfunction ($matches) {}

And it's here that the brickwalls start happening. I can put stuff into the blank extraction array - but they are nto available outside the function. I can update the string with tokens, but I lose access to the URLs that are replaced. I cannot seem to add additional values to the function call within the preg_replace_callback.

I'm hoping someone can help, as this is driving me nuts.


UPDATE:

Based on the solution suggested by @Lepidosteus, I think I have the following working?

$mystring = 'my sentence contains URLs to http://www.google.com/this.html and http://www.yahoo.com which makes my life difficult.';
$myregex = '/(((?:https?|ftps?)\:\/\/)?([a-zA-Z0-9:]*[@])?([a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}|([0-9]+))([a-zA-Z0-9-._?,\'\/\+&%\$#\=~:]+)?)/';
$tokenstart = ":URL:";
$tokenend = ":";


function extraction ($myregex, $mystring, $mymatches, $tokenstart, $tokenend) {
$test1 = preg_match_all($myregex,$mystring,$mymatches);
$mymatches = array_slice($mymatches, 0, 1);
$thematches = array();

foreach ($mymatches as $match) {
    foreach ($match as $key=>$match2) {
        $thematches[] = array($match2, $tokenstart.$key.$tokenend);
    }
}


return $thematches;
}
$matches = extraction ($myregex, $mystring, $mymatches, $tokenstart, $tokenend);
echo "1) ".$mystring."<br/>";
// 1) my sentence contains URLs to http://www.google.com/this.html and http://www.yahoo.com which makes my life difficult.



function substitute($matches,$mystring) {
foreach ($matches as $match) {
    $mystring = str_replace($match[0], $match[1], $mystring);
}
return $mystring;
}
$mystring = substitute($matches,$mystring);
echo "2) ".$mystring."<br/>";
// 2) my sentence contains URLs to :URL:0: and :URL:1: which makes my life difficult.


function reinsert($matches,$mystring) {
foreach ($matches as $match) {
    $mystring = str_replace($match[1], $match[0], $mystring);
}
return $mystring;
}
$mystring = reinsert($matches,$mystring);
echo "3) ".$mystring."<br/>";
// 3) my sentence contains URLs to http://www.google.com/this.html and http://www.yahoo.com which makes my life difficult.

That appears to work?

Was it helpful?

Solution

The key to solving your problem here is to store the urls list in an outside container than can be accessed by your callbacks and by your main code to do the changes you need on them. To remember your urls positions, we will use a custom token in the string.

Note that to access the container I use closures, if you can't use php 5.3 for some reason you will need to replace them with another way to access the $url_tokens container from within the callback, which shouldn't be a problem.

<?php
// the string you start with

$string = "my sentence contains URLs to http://stackoverflow.com/questions/7619843/php-preg-replace-call-extract-specific-values-for-later-reinsertion and http://www.google.com/ which makes my life difficult.";

// the url container, you will store the urls found here

$url_tokens = array();

// the callback for the first replace, will take all urls, store them in $url_tokens, then replace them with [[URL::X]] with X being an unique number for each url
//
// note that the closure use $url_token by reference, so that we can add entries to it from inside the function

$callback = function ($matches) use (&$url_tokens) {
  static $token_iteration = 0;

  $token = '[[URL::'.$token_iteration.']]';

  $url_tokens[$token_iteration] = $matches;

  $token_iteration++;

  return $token;
};

// replace our urls with our callback

$pattern = '/(((?:https?|ftps?)\:\/\/)?([a-zA-Z0-9:]*[@])?([a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}|([0-9]+))([a-zA-Z0-9-._?,\'\/\+&amp;%\$#\=~:]+)?)/';

$string = preg_replace_callback($pattern, $callback, $string);

// some debug code to check what we have at this point

var_dump($url_tokens);
var_dump($string);

// you can do changes to the url you found in $url_tokens here

// now we will replace our previous tokens with a specific string, just as an exemple of how to re-replace them when you're done

$callback_2 = function ($matches) use ($url_tokens) {
  $token = $matches[0];
  $token_iteration = $matches[1];

  if (!isset($url_tokens[$token_iteration])) {
    // if we don't know what this token is, leave it untouched
    return $token;
  }

  return '- there was an url to '.$url_tokens[$token_iteration][4].' here -';
};

$string = preg_replace_callback('/\[\[URL::([0-9]+)\]\]/', $callback_2, $string);

var_dump($string);

Which give this result when executed:

// the $url_tokens array after the first preg_replace_callback
array(2) {
  [0]=>
  array(7) {
    [0]=>
    string(110) "http://stackoverflow.com/questions/7619843/php-preg-replace-call-extract-specific-values-for-later-reinsertion"
    [1]=>
    string(110) "http://stackoverflow.com/questions/7619843/php-preg-replace-call-extract-specific-values-for-later-reinsertion"
    [2]=>
    string(7) "http://"
    [3]=>
    string(0) ""
    [4]=>
    string(17) "stackoverflow.com"
    [5]=>
    string(0) ""
    [6]=>
    string(86) "/questions/7619843/php-preg-replace-call-extract-specific-values-for-later-reinsertion"
  }
  [1]=>
  array(7) {
    [0]=>
    string(22) "http://www.google.com/"
    [1]=>
    string(22) "http://www.google.com/"
    [2]=>
    string(7) "http://"
    [3]=>
    string(0) ""
    [4]=>
    string(14) "www.google.com"
    [5]=>
    string(0) ""
    [6]=>
    string(1) "/"
  }
}
// the $string after the first preg_replace_callback
string(85) "my sentence contains URLs to [[URL::0]] and [[URL::1]] which makes my life difficult."

// the $string after the second replace
string(154) "my sentence contains URLs to - there was an url to stackoverflow.com here - and - there was an url to www.google.com here - which makes my life difficult."
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top