문제

I'm using this guide to protect a folder via .htaccess and a PHP script.

We use a Google Search Appliance to index this particular protect folder. However, I'm not sure how to allow the crawler through.

To test, I used a firefox addon to fake my user_agent (to msnbot in this case) and used the script echo $_SERVER['HTTP_USER_AGENT'], verifying that msnbot/1.1 (+http://search.msn.com/msnbot.htm was in fact my determined UA.

This is the string of conditionals that authentication script checks against. All of these conditions work, except the last.

current_user_can('edit_posts') || mm_member_decision( array ( "isMember"=>"true", "hasBundle"=>"1", "status" => "active" ) ) || auth_redirect() || ($_SERVER['HTTP_USER_AGENT'] == 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)');
도움이 되었습니까?

해결책 2

Figured it out. || auth_redirect() should be last in the conditional.

다른 팁

The Google Search Appliance user agent is named gsa-crawler.

A full user-agent string might look like this:

gsa-crawler (Enterprise; GID09999; name@company.com)

https://developers.google.com/search-appliance/documentation/614/help_gsa/crawl_headers

Try to allow that user-agent for a successful crawl. And because you already figured out, that the user-agent alone is not enough, please add a check for the id and/or the email.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top