Question

I'm looking to prevent robots crawling our site and downloading the thousands of images hosted there.

I read this lately https://github.com/remy/password-policy about best practices for password policy. One of the ideas was to delay repeat requests exponentially there by not limiting humans but punishing bots.

Would this be possible in php?

Was it helpful?

Solution

The easiest way is to rewrite the relevant URLs to a PHP script, that implements the download. This would

  • keep a list of source addresses (e.g. in memcached) with their last timestamp and penalty time
  • double or zero the penalty time according to the timestamp,
  • write the record back
  • and finally sleep and do the download.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top