Pregunta

I'm getting all ahrefs of a page with this loop:

foreach($html->find('a[href!="#"]') as $ahref) {
    $ahrefs++;
}

I want to do something like this:

foreach($html->find('a[href!="#"]') as $ahref) {
    if(isexternal($ahref)) {
        $external++;
    }
    $ahrefs++;
}

Where isexternal is a function

function isexternal($url) {
    // FOO...

    // Test if link is internal/external
    if(/*condition is true*/) {
        return true;
    }
    else {
        return false;
    }
}

Help!

¿Fue útil?

Solución

Use parse_url and compare host to your local host (often but not always it's the same as $_SERVER['HTTP_HOST'])

function isexternal($url) {
  $components = parse_url($url);    
  return !empty($components['host']) && strcasecmp($components['host'], 'example.com'); // empty host will indicate url like '/relative.php'
}

Hovewer this will treat www.example.com and example.com as different hosts. If you want all your subdomains to be treated as local links then the function will be somewhat larger:

function isexternal($url) {
  $components = parse_url($url);
  if ( empty($components['host']) ) return false;  // we will treat url like '/relative.php' as relative
  if ( strcasecmp($components['host'], 'example.com') === 0 ) return false; // url host looks exactly like the local host
  return strrpos(strtolower($components['host']), '.example.com') !== strlen($components['host']) - strlen('.example.com'); // check if the url host is a subdomain
}

Otros consejos

This is how you can simply detect external URLs:

$url    = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';

$internal = (
    false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
    stripos( $url, '.' . $domain ) ||            // include subdomains, like "www.my-domain.com". DANGEROUS (see below)!
    (
        0 !== strpos( $url, '//' ) &&            // exclude protocol relative URLs, like "//example.com"
        0 === strpos( $url, '/' )                // include root-relative URLs, like "/demo"
    )
);

The above check will treat www.my-domain.com and my-domain.com as being "internal".

Why this rule is dangerous:

The subdomain logic introduces a weakness that could be exploited: When an external URL contains your domain inside the path, for example, https://external.com/www.my-domain.com is treated as internal!

More secure code:

This problem can be eliminated by removing subdomain support (which I suggest to do):

$url    = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';

$internal = (
    false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
    (
        0 !== strpos( $url, '//' ) &&            // exclude protocol relative URLs, like "//example.com"
        0 === strpos( $url, '/' )                // include root-relative URLs, like "/demo"
    )
);

I know this post is old but here my function i coded right now. Maybe some other need it too.

function IsResourceLocal($url){
    if( empty( $url ) ){ return false; }
    $urlParsed = parse_url( $url );
    $host = $urlParsed['host'];
    if( empty( $host ) ){ 
    /* maybe we have a relative link like: /wp-content/uploads/image.jpg */
    /* add absolute path to begin and check if file exists */
    $doc_root = $_SERVER['DOCUMENT_ROOT'];
    $maybefile = $doc_root.$url;
    /* Check if file exists */
    $fileexists = file_exists ( $maybefile );
    if( $fileexists ){
        /* maybe you want to convert to full url? */
        return true;        
        }
     }
    /* strip www. if exists */
    $host = str_replace('www.','',$host);
    $thishost = $_SERVER['HTTP_HOST'];
    /* strip www. if exists */
    $thishost = str_replace('www.','',$thishost);
    if( $host == $thishost ){
        return true;
        }
    return false;
}
function isexternal($url) {
    // FOO...

    // Test if link is internal/external
    if(strpos($url,'domainname.com') !== false || strpos($url,"/") === '0') 
    {
         return true;
    }
    else 
    {
         return false;
    }
}

You probably want to check if the link is in the same domain. That will only work though if all your href attributes are absolute and contain the domain. Relative ones like /test/file.html are tricky because one can have folders that have the same name as domains.. So, if you have full url's in each link:

function isexternal($url) {

  // Test if link is internal/external
  if(stristr($url, "myDomain.com") || strpos($url,"/") == '0')
    return true;
  else
    return false;
}
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top