Question

I am using WordPress. One of the files functions.php contains function do_robots() {... which blocks Google crawling. I have replaced this function with the following:

function do_robots() {
    header( 'Content-Type: text/plain; charset=utf-8' );

    do_action( 'do_robotstxt' );

    if ( '0' == get_option( 'blog_public' ) ) {
                     echo  "User-agent: *";
                     echo  "\nDisallow: /wp-admin";
                     echo  "\nDisallow: /wp-includes";
                     echo  "\nDisallow: /wp-content";
                     echo  "\nDisallow: /stylesheets";
                     echo  "\nDisallow: /_db_backups";
                     echo  "\nDisallow: /cgi";
                     echo  "\nDisallow: /store";
                     echo  "\nDisallow: /wp-includes\n";
    } else {
                     echo  "User-agent: *";
                     echo  "\nDisallow: /wp-admin";
                     echo  "\nDisallow: /wp-includes";
                     echo  "\nDisallow: /wp-content";
                     echo  "\nDisallow: /stylesheets";
                     echo  "\nDisallow: /_db_backups";
                     echo  "\nDisallow: /cgi";
                     echo  "\nDisallow: /store";
                     echo  "\nDisallow: /wp-includes\n";
    }
}
  1. I am not quite sure about Allow. Is it that as long as I do not Disallow, it is Allow by default?
  2. Why does Google Bot still get blocked by the above function?
Was it helpful?

Solution

The original function out of SVN looks like it's blocking fewer paths than your example above, so I would recommend removing some of the extra directories (e.g. wp-content) and seeing if that is what you're looking for. You could also try the WordPress plugin to generate a Google Sitemap for their engine to read.

function do_robots() {
    header( 'Content-Type: text/plain; charset=utf-8' );

    do_action( 'do_robotstxt' );

    $output = "User-agent: *\n";
    $public = get_option( 'blog_public' );
    if ( '0' == $public ) {
        $output .= "Disallow: /\n";
    } else {
        $site_url = parse_url( site_url() );
        $path = ( !empty( $site_url['path'] ) ) ? $site_url['path'] : '';
        $output .= "Disallow: $path/wp-admin/\n";
        $output .= "Disallow: $path/wp-includes/\n";
    }

    echo apply_filters('robots_txt', $output, $public);
}

The rule for robots.txt files is that everything is allowed unless specified, though a search engine obeying robots.txt is more of a trust system than anything.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top