Question

The Problem

This is a very simple-to-understand question.

I'm having a user submit a URL, for example "http://example.com/path/filename.html".

I'm using PHP's dirname() function to get the so-called "base" of this URL. For the above example, that would be "http://example.com/path".

My problem arises when the user enters this:

http://example.com/blog

If you type the above into your browser, you will see the index.php or .html page in the folder called "blog". However, PHP's dirname() will return only "http://example.com".

I'm not sure if it thinks that "blog" is an extension-less file, if that exists, but I can't really find a solution.

Things I've Tried

I first tried getting the extension of the URL using this quick method:

$url = 'http://example.com/index.php';
$file_extension = end(explode('.', $filename));

Then, I would check if the extension existed using PHP empty(). If the extension exists, that means that a filename was entered after the folder, such as "http://example.com/path/file.html", and dirname() is perfect. If the extension doesn't exist, no file was entered and the last item in the path is a folder, so it is already "the base".

However, in the case of simply "http://example.com/path/", the above would return ".com/path/" as the file extension, which we all know doesn't exist. In this case, I would use the dirname() function and cut off "/path/".

EDIT:

Taking the extension of basename($url) won't work because if the user enters "http://example.com" basename() returns "example.com", the extension for which is supposedly ".com"

Hopefully, someone has had the same problem and knows the solution. I'm still looking, but any answers are wholly appreciated!!

Was it helpful?

Solution

EDIT Ok, last time before I give up:

function getPath($url){
    $parts=explode("/",$url);
    $patharray=array(".","http:","https:");
    if(!in_array(pathinfo($url,PATHINFO_DIRNAME),$patharray) && strpos($parts[count($parts)-1], ".")!==false)
        unset($parts[count($parts)-1]);
    $url=implode("/",$parts);
    if(substr($url,-1)!='/')
        $url.="/";
    return $url;
}
echo getPath("http://www.google.com/blog/testing.php")."\n";
echo getPath("www.google.com/blog/testing.php")."\n";
echo getPath("http://www.google.com/blog/")."\n";
echo getPath("http://www.google.com/blog")."\n";
echo getPath("http://www.google.com")."\n";
echo getPath("http://www.google.com/")."\n";
echo getPath("www.google.com/")."\n";
echo getPath("www.google.com")."\n";

Any url with the last portion having a "." in it is parsed out, otherwise it is left alone. It uses pathinfo() to check to see if it is just a domain ("google.com" or "http://www.google.com") and then leaves the last portion alone as there would be a "." in it. Here is the script output:

http://www.google.com/blog/
www.google.com/blog/
http://www.google.com/blog/
http://www.google.com/blog/
http://www.google.com/
http://www.google.com/
www.google.com/
www.google.com/
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top