Question

i'm crawling an SNS with crawler written in python

it works for a long time, but few days ago, the webpages got from my severs were ERROR 403 FORBIDDEN.

i tried to change the cookie, change the browser, change the account, but all failed.

and it seems that are the forbidden severs are in the same network segment.

what can i do? steal someone else's ip? = =...

thx a lot

Was it helpful?

Solution

Looks like you've been blacklisted at the router level in that subnet, perhaps because you (or somebody else in the subnet) was violating terms of use, robots.txt, max crawling frequency as specified in a site-map, or something like that.

The solution is not technical, but social: contact the webmaster, be properly apologetic, learn what exactly you (or one of your associates) had done wrong, convincingly promise to never do it again, apologize again until they remove the blacklisting. If you can give that webmaster any reason why they should want to let you crawl that site (e.g., your crawling feeds a search engine that will bring them traffic, or something like this), so much the better!-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top