Question

I'm using this guide to protect a folder via .htaccess and a PHP script.

We use a Google Search Appliance to index this particular protect folder. However, I'm not sure how to allow the crawler through.

To test, I used a firefox addon to fake my user_agent (to msnbot in this case) and used the script echo $_SERVER['HTTP_USER_AGENT'], verifying that msnbot/1.1 (+http://search.msn.com/msnbot.htm was in fact my determined UA.

This is the string of conditionals that authentication script checks against. All of these conditions work, except the last.

current_user_can('edit_posts') || mm_member_decision( array ( "isMember"=>"true", "hasBundle"=>"1", "status" => "active" ) ) || auth_redirect() || ($_SERVER['HTTP_USER_AGENT'] == 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)');
Was it helpful?

Solution 2

Figured it out. || auth_redirect() should be last in the conditional.

OTHER TIPS

The Google Search Appliance user agent is named gsa-crawler.

A full user-agent string might look like this:

gsa-crawler (Enterprise; GID09999; name@company.com)

https://developers.google.com/search-appliance/documentation/614/help_gsa/crawl_headers

Try to allow that user-agent for a successful crawl. And because you already figured out, that the user-agent alone is not enough, please add a check for the id and/or the email.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top