There is no fool proof way of doing this. The most effective solution for me was:
Implement a User Agent check at the web server level (Yes this is not fool proof). Target to block out the known / common programs that people use to hit URLs. Like libperl, httpclient etc. You should be able to build such a list from your access logs.
Depending on your situation, you may or may not want search engine spiders to crawl your site. Add robots.txt to your server accordingly. Not all spiders / crawlers follow instructions from robots.txt, but most do.
Use a specialized tool to detect abnormal access to your site. Something like https://www.cloudflare.com/ which can track all access to your site, and match it with an ever growing database of known and suspected bots.
Note: I am in no way affiliated to cloudflare :)