Question

Well, I want to extract parameters from url using preg_replace. For now I have something like this:

preg_replace("%(?<=[\?&])$param=.+(&|$)%", '', $url);

But it doesn't work well. The problem is with the ending of expression. How can I write expression that would do the following: "match & character or the end of the string"?

Was it helpful?

Solution

I'm a little confused by your post as you are saying that you want to "extract" the parameters, but then you say you want to use preg_replace. If you want to actually extract the parameters so you can process them, you'd probably want preg_match_all instead. If want to replace the question mark and everything after it with nothing, (which is what I think you're trying to do from your sample code) that'd be pretty easy to accomplish. Something like this would do that:

$string = 'https://www.google.com/search?q=Metallica+Turn+The+Page&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a&channel=sb';

$string = preg_replace('~\?.*~', '', $string);

print $string;

This looks for a question mark \?, followed by any character ., any number of times * and then replaces it with nothing. This will give you:

https://www.google.com/search

If you want to actually replace each parameter with something, you can do that as well

// REPLACE ALL OF THE PARAMS WITH DUMMY-PARAM
$string = 'https://www.google.com/search?q=Metallica+Turn+The+Page&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a&channel=sb';

$string = preg_replace('~[?&]\K([A-Z0-9+%]+)=.+?(?=&|$)~i', '$1=DUMMY-PARAM', $string);

print "\n\n".$string;

Or

// REPLACE ALL OF THE KEYS WITH DUMMY-KEY
$string = 'https://www.google.com/search?q=Metallica+Turn+The+Page&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a&channel=sb';

$string = preg_replace('~[?&]\K[A-Z0-9+%]+=(.+?)(?=&|$)~i', 'DUMMY-KEY=$1', $string);

print "\n\n".$string;

Or

// PULL OUT ALL KEYS/VALUES
$string = 'https://www.google.com/search?q=Metallica+Turn+The+Page&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a&channel=sb';

$string = preg_replace('~[?&]\K([A-Z0-9+%]+)=(.+?)(?=&|$)~i', "\n$1: $2", $string);

print "\n\n".$string;

All three of these start off with a character class looking for a question mark or ampersand. The \K tells the expression to start matching once it finds whatever was before it. Then we look for the key which is a character class of some of the URL characters, followed by an equal sign and followed by at least one character (but can be more), all the way up until it hits the next part of the expression. Here, I've used .+? to represent that, but it would be even better to use a character class instead. Something like this worked for this expression: ([-.:A-Z0-9+%]+)?. In any case, the next part of the expression is a lookahead to check and see if an ampersand or the end of the line is coming up next.

Here is a working demo for you to compare the differences

Hopefully that answers your question properly. As I mentioned earlier, I was confused by it. If not, let me know and I will reevaluate.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top