Question

What there be a performance hit when I add this to my .htaccess file:

HOWTO stop automated spam-bots using .htaccess

or should I add it to my PHP file instead?

or leave it out completely? Because spammers might fake their useragent anyway?

Would it also make sense to prevent users from accessing your website via a proxy server? I know that this might also block people from accessing your website who didn't come here with bad intentions. But, what are some of the reasons why people would visit a website via a proxy server, other than spam, or when a website is blocked in their country?

Was it helpful?

Solution

What there be a performance hit when I add this to my .htaccess file?

Possibly, if you have thousands or tens of thousands of user agent strings to match against. Apache has to check this rule on every request.

or should I add it to my PHP file instead?

No Apache's parsing of .htaccess will still be quicker than a PHP process. For PHP, Apache has to start a PHP interpreter process for every request.

or leave it out completely? Because spammers might fake their useragent anyway?

Probably yes. It is very likely that most malicious spam bots will be faking a standard user agent.

But, what are some of the reasons why people would visit a website via a proxy server, other than spam, or when a website is blocked in their country?

There is a lot of legitimate uses for a proxy server. One is mobile clients that use some sort of prefetching to save mobile traffic. There are also some ISPs who force their clients to use their proxy servers. In my opinion, locking out users who use a proxy server is not a wise move.

The bottom line is probably that these things are not worth worrying about unless you have a lot of traffic going to waste because of malicious activities.

OTHER TIPS

I personally would focus more on securing the basics like forms, codes, open ports etc. of the website as compared to blocking. A visit counts anyway! ;)

...whats wrong with setting up a domain dot com/bottrap, disallow access to it through robots.txt, capture the naughty bot, put its IP in .txt array, denying it access with a 403 header forever?

PHP Limit/Block Website requests for Spiders/Bots/Clients etc.

Here i have written a PHP function which can Block unwanted Requests to reduce your Website-Traffic. God for Spiders, Bots and annoying Clients.

CLIENT/Bots Blocker

DEMO: http://szczepan.info/9-webdesign/php/1-php-limit-block-website-requests-for-spiders-bots-clients-etc.html

CODE:

/* Function which can Block unwanted Requests
 * @return boolean/array status
 */
function requestBlocker()
{
        /*
        Version 1.0 11 Jan 2013
        Author: Szczepan K
        http://www.szczepan.info
        me[@] szczepan [dot] info
        ###Description###
        A PHP function which can Block unwanted Requests to reduce your Website-Traffic.
        God for Spiders, Bots and annoying Clients.

        */

        $dir = 'requestBlocker/'; ## Create & set directory writeable!!!!

        $rules   = array(
                #You can add multiple Rules in a array like this one here
                #Notice that large "sec definitions" (like 60*60*60) will blow up your client File
                array(
                        //if >5 requests in 5 Seconds then Block client 15 Seconds
                        'requests' => 5, //5 requests
                        'sek' => 5, //5 requests in 5 Seconds
                        'blockTime' => 15 // Block client 15 Seconds
                ),
                array(
                        //if >10 requests in 30 Seconds then Block client 20 Seconds
                        'requests' => 10, //10 requests
                        'sek' => 30, //10 requests in 30 Seconds
                        'blockTime' => 20 // Block client 20 Seconds
                ),
                array(
                        //if >200 requests in 1 Hour then Block client 10 Minutes
                        'requests' => 200, //200 requests
                        'sek' => 60 * 60, //200 requests in 1 Hour
                        'blockTime' => 60 * 10 // Block client 10 Minutes
                )
        );
        $time    = time();
        $blockIt = array();
        $user    = array();

        #Set Unique Name for each Client-File 
        $user[] = isset($_SERVER['REMOTE_ADDR']) ? $_SERVER['REMOTE_ADDR'] : 'IP_unknown';
        $user[] = isset($_SERVER['HTTP_USER_AGENT']) ? $_SERVER['HTTP_USER_AGENT'] : '';
        $user[] = strtolower(gethostbyaddr($user[0]));

        # Notice that i use files because bots does not accept Sessions
        $botFile = $dir . substr($user[0], 0, 8) . '_' . substr(md5(join('', $user)), 0, 5) . '.txt';


        if (file_exists($botFile)) {
                $file   = file_get_contents($botFile);
                $client = unserialize($file);

        } else {
                $client                = array();
                $client['time'][$time] = 0;
        }

        # Set/Unset Blocktime for blocked Clients
        if (isset($client['block'])) {
                foreach ($client['block'] as $ruleNr => $timestampPast) {
                        $left = $time - $timestampPast;
                        if (($left) > $rules[$ruleNr]['blockTime']) {
                                unset($client['block'][$ruleNr]);
                                continue;
                        }
                        $blockIt[] = 'Block active for Rule: ' . $ruleNr . ' - unlock in ' . ($left - $rules[$ruleNr]['blockTime']) . ' Sec.';
                }
                if (!empty($blockIt)) {
                        return $blockIt;
                }
        }

        # log/count each access
        if (!isset($client['time'][$time])) {
                $client['time'][$time] = 1;
        } else {
                $client['time'][$time]++;

        }

        #check the Rules for Client
        $min = array(
                0
        );
        foreach ($rules as $ruleNr => $v) {
                $i            = 0;
                $tr           = false;
                $sum[$ruleNr] = '';
                $requests     = $v['requests'];
                $sek          = $v['sek'];
                foreach ($client['time'] as $timestampPast => $count) {
                        if (($time - $timestampPast) < $sek) {
                                $sum[$ruleNr] += $count;
                                if ($tr == false) {
                                        #register non-use Timestamps for File 
                                        $min[] = $i;
                                        unset($min[0]);
                                        $tr = true;
                                }
                        }
                        $i++;
                }

                if ($sum[$ruleNr] > $requests) {
                        $blockIt[]                = 'Limit : ' . $ruleNr . '=' . $requests . ' requests in ' . $sek . ' seconds!';
                        $client['block'][$ruleNr] = $time;
                }
        }
        $min = min($min) - 1;
        #drop non-use Timestamps in File 
        foreach ($client['time'] as $k => $v) {
                if (!($min <= $i)) {
                        unset($client['time'][$k]);
                }
        }
        $file = file_put_contents($botFile, serialize($client));


        return $blockIt;

}


if ($t = requestBlocker()) {
        echo 'dont pass here!';
        print_R($t);
} else {
        echo "go on!";
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top