Question

I am simplephpdom using to get a href links values with this code :

$html = file_get_html($url);
foreach($html->find('a') as $element) $array[] = $element->href . '<br>'; 

Now, the problem is that if the link , is starting with slash (/) the link will be not valid !

How can i have valid links ?

For example , the link is like this :

<a href="/news45454.html">Test link</a>

if i use phpsimpledom code, i will have :

/news45454.html

But, i want to have :

http://example.com/news45454.html

How to get this?

Can we test , if the link was starting with slash , then add site url to it ?! How ?

Was it helpful?

Solution

Basically you need to test if the HREF element is a valid full URL. If the validation passes, you can go ahead and add it to the array. However, if the validation fails, you need to concatenate the basename (which should be website's domain).

$html = file_get_html($url);
foreach($html->find('a') as $element) {
    if(filter_var($url, FILTER_VALIDATE_URL)) {
        // Valid URL, add to array.
        $array[] = $element->href . '<br>'; 
    } else {
        // URL is invalid, add basename.
        $array[] = basename($url) . $element->href . '<br>';
    }
}

This may need a bit of tweaking for other cases (such as <a href="#">) but it should work for the situation you outlined.

OTHER TIPS

To delete slashes Use:

string stripcslashes ( string $str )

Also see the PHP Manual: stripcslashes

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top