Question

I am using WordPress. One of the files functions.php contains function do_robots() {... which blocks Google crawling. I have replaced this function with the following:

function do_robots() {
    header( 'Content-Type: text/plain; charset=utf-8' );

    do_action( 'do_robotstxt' );

    if ( '0' == get_option( 'blog_public' ) ) {
                     echo  "User-agent: *";
                     echo  "\nDisallow: /wp-admin";
                     echo  "\nDisallow: /wp-includes";
                     echo  "\nDisallow: /wp-content";
                     echo  "\nDisallow: /stylesheets";
                     echo  "\nDisallow: /_db_backups";
                     echo  "\nDisallow: /cgi";
                     echo  "\nDisallow: /store";
                     echo  "\nDisallow: /wp-includes\n";
    } else {
                     echo  "User-agent: *";
                     echo  "\nDisallow: /wp-admin";
                     echo  "\nDisallow: /wp-includes";
                     echo  "\nDisallow: /wp-content";
                     echo  "\nDisallow: /stylesheets";
                     echo  "\nDisallow: /_db_backups";
                     echo  "\nDisallow: /cgi";
                     echo  "\nDisallow: /store";
                     echo  "\nDisallow: /wp-includes\n";
    }
}
  1. I am not quite sure about Allow. Is it that as long as I do not Disallow, it is Allow by default?
  2. Why does Google Bot still get blocked by the above function?
Était-ce utile?

La solution

The original function out of SVN looks like it's blocking fewer paths than your example above, so I would recommend removing some of the extra directories (e.g. wp-content) and seeing if that is what you're looking for. You could also try the WordPress plugin to generate a Google Sitemap for their engine to read.

function do_robots() {
    header( 'Content-Type: text/plain; charset=utf-8' );

    do_action( 'do_robotstxt' );

    $output = "User-agent: *\n";
    $public = get_option( 'blog_public' );
    if ( '0' == $public ) {
        $output .= "Disallow: /\n";
    } else {
        $site_url = parse_url( site_url() );
        $path = ( !empty( $site_url['path'] ) ) ? $site_url['path'] : '';
        $output .= "Disallow: $path/wp-admin/\n";
        $output .= "Disallow: $path/wp-includes/\n";
    }

    echo apply_filters('robots_txt', $output, $public);
}

The rule for robots.txt files is that everything is allowed unless specified, though a search engine obeying robots.txt is more of a trust system than anything.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top