Pergunta

I need to provide access to a private wiki to a crawler.

The wiki is closed to all anonymous users - you have to log in in order to see the contents, but I need to provide a single crawler (identified by a user-agent string and a single IP) full access so the contents can be indexed. It's an internal crawler so access to its resources will only be available upon successful login.

Any suggestions on how to enable access to a single client (and not user, since a crawler is not able to log itself into the wiki)?

Foi útil?

Solução

There actually is a solution to this problem. As I mentioned a crawler will be using a specific IP and it will only be the crawler to use it. So quick and dirty but still a civilised way to do it is:

$crawler_ip = '1.2.3.4';
if ( $_SERVER['REMOTE_ADDR'] == $crawler_ip ) {
    $wgGroupPermissions['*']['read'] = true;
}

Simple, huh? :)

Outras dicas

If you have access to the database you can use a database crawler in a system like solar to do this for you.

You can make a custom usergroup for your crawler, let's say we call it 'crawler'. Since it has to login anyway that'd be the easiest solution.

Just give it read permissions like this:

$wgGroupPermissions['*']['read']    = false;
$wgGroupPermissions['crawler']['read']    = true;

Reference: http://www.mediawiki.org/wiki/Manual:User_rights#Changing_group_permissions

edit Hmm wait, I misread. The crawler is probably not a logged-in account right? Hold on, checking if you can set permissions to an IP.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top