Question

We have an MU (Multisite) installation of Drupal7 here at work, and are trying to temporarily hold back the swarm of bots we receive until we get a chance to load our content. I wrote a quick and and dirty script to send 503 headers if we find a certain criteria in Xpath (This can ALSO be done as a strpos/preg_match if DOM is not formed).

In order to get the ball rolling though I need to figure out how to either

A) Hijack the Drupal7 bootstrap and pull all content through this filter below

B) ob_flush content through the filter before content is loaded

WORTH MENTIONING We use a Module that is called Domain Access, which I believe has led me on this crazy chase in the first place. I know for a fact that it muddles with quite a few files...

The issue that I am having is figuring out exactly where I can catch the content at? It should be possible to push the stream into a variable, strpos it, then release it, correct? I thought that index.php in Drupal7 would be the suspect, but I'm a little confused as to where or how I should capture the contents. Here's the script, and hopefully someone can point me in the right direction.

//error_reporting(-1);

    /* start query */

    $dom = new DOMDocument;
    $dom->preserveWhiteSpace = false;
    $dom->Load($_SERVER['PHP_SELF']);

    $xpath = new DOMXPath($dom);

        //if this exists we aren't ready to be read by bots
        $query = $xpath->query(".//*[@id='block-views-about-this-site-block']/div/div/div");
        //or $query = 'klat-badge'; //if this is a string not DOM

    /* end query */

if(strpos($query) !== false) { 

    //require banlist
    require('botlist.php'); 

    $str = strtolower('/'.implode('|', array_unique($list)).'/i'); 
    if(preg_match($str, strtolower($_SERVER['HTTP_USER_AGENT']))) {
        //so tell bots we're broken
        header('HTTP/1.1 503 Service Temporarily Unavailable');
        header('Status: 503 Service Temporarily Unavailable');
        exit;
    }
}
Was it helpful?

Solution

It would be a lot easier to just define a constant in a module and check that instead. You could then use hook_init() to make a decision on whether the page is ready before the content is even built:

define('IN_DEVELOPMENT', TRUE);

function mymodule_init() {
  if (IN_DEVELOPMENT) {
    //require banlist
    require('botlist.php'); 

    $str = strtolower('/'.implode('|', array_unique($list)).'/i'); 
    if(preg_match($str, strtolower($_SERVER['HTTP_USER_AGENT']))) {
      //so tell bots we're broken
      header('HTTP/1.1 503 Service Temporarily Unavailable');
      header('Status: 503 Service Temporarily Unavailable');
      exit;
    }
  }
}

There might be a way to do what you want by loading the whole page content into a DOMDocument but it wont be easy in Drupal (as I'm sure you've already discovered!) and certainly not efficient.

Hope that helps

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top