Question

In PHP, I want to compare two relative URLs for equality. The catch: URLs may differ in percent-encoding, e.g.

  • /dir/file+file vs. /dir/file%20file
  • /dir/file(file) vs. /dir/file%28file%29
  • /dir/file%5bfile vs. /dir/file%5Bfile

According to RFC 3986, servers should treat these URIs identically. But if I use == to compare, I'll end up with a mismatch.

So I'm looking for a PHP function which will accepts two strings and returns TRUE if they represent the same URI (dicounting encoded/decoded variants of the same char, upper-case/lower-case hex digits in encoded chars, and + vs. %20 for spaces), and FALSE if they're different.

I know in advance that only ASCII chars are in these strings-- no unicode.

Was it helpful?

Solution

function uriMatches($uri1, $uri2)
{
    return urldecode($uri1) == urldecode($uri2);
}

echo uriMatches('/dir/file+file', '/dir/file%20file');      // TRUE
echo uriMatches('/dir/file(file)', '/dir/file%28file%29');  // TRUE
echo uriMatches('/dir/file%5bfile', '/dir/file%5Bfile');    // TRUE

urldecode

OTHER TIPS

EDIT: Please look at @webbiedave's response. His is much better (I wasn't even aware that there was a function in PHP to do that.. learn something new everyday)

You will have to parse the strings to look for something matching %## to find the occurences of those percent encoding. Then taking the number from those, you should be able to pass it so the chr() function to get the character of those percent encodings. Rebuild the strings and then you should be able to match them.

Not sure that's the most efficient method, but considering URLs are not usually that long, it shouldn't be too much of a performance hit.

I know this problem here seems to be solved by webbiedave, but I had my own problems with it.

First problem: Encoded characters are case-insensitive. So %C3 and %c3 are both the exact same character, although they are different as a URI. So both URIs point to the same location.

Second problem: folder%20(2) and folder%20%282%29 are both validly urlencoded URIs, which point to the same location, although they are different URIs.

Third problem: If I get rid of the url encoded characters I have two locations having the same URI like bla%2Fblubb and bla/blubb.

So what to do then? In order to compare two URIs, I need to normalize both of them in a way that I split them in all components, urldecode all paths and query-parts for once, rawurlencode them and glue them back together and then I could compare them.

And this could be the function to normalize it:

function normalizeURI($uri) {
    $components = parse_url($uri);
    $normalized = "";
    if ($components['scheme']) {
        $normalized .= $components['scheme'] . ":";
    }
    if ($components['host']) {
        $normalized .= "//";
        if ($components['user']) { //this should never happen in URIs, but still probably it's anything can happen thursday
            $normalized .= rawurlencode(urldecode($components['user']));
            if ($components['pass']) {
                $normalized .= ":".rawurlencode(urldecode($components['pass']));
            }
            $normalized .= "@";
        }
        $normalized .= $components['host'];
        if ($components['port']) {
            $normalized .= ":".$components['port'];
        }
    }
    if ($components['path']) {
        if ($normalized) {
            $normalized .= "/";
        }
        $path = explode("/", $components['path']);
        $path = array_map("urldecode", $path);
        $path = array_map("rawurlencode", $path);
        $normalized .= implode("/", $path);
    }
    if ($components['query']) {
        $query = explode("&", $components['query']);
        foreach ($query as $i => $c) {
            $c = explode("=", $c);
            $c = array_map("urldecode", $c);
            $c = array_map("rawurlencode", $c);
            $c = implode("=", $c);
            $query[$i] = $c;
        }
        $normalized .= "?".implode("&", $query);
    }
    return $normalized;
}

Now you can alter webbiedave's function to this:

function uriMatches($uri1, $uri2) {
    return normalizeURI($uri1) === normalizeURI($uri2);
}

That should do. And yes, it is quite more complicated than even I wanted it to be.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top