Question


I've coded an application that simply parse a list of webpages of a specified website and extract the content with JSoup.
The problem is that with my IP, I can request a maximum of 3 specific pages from my list (on the same domain) per day, after those 3 pages, every request I try is redirected to a page that ask me to come back next day for 3 more requests.
What I'm trying to do is to let my application change my IP every 3 requests.

I've already tested SilverTunnel and JTor (allowing me to take every 3 request a new idendity-IP for my requests), but those libs are bad documented and have almost no examples of how I can change my identity every N cycles).

I'm asking if someone knows a way to let my application change my IP, mask it or even ask my ISP for a specific IPV6 to use and release changing with a new one different.

Do anyone know any solution to that problem or tested something similar?

Thanks all.

Was it helpful?

Solution

The best solution for this use case is to ask the web sites for permission for what you do. They will then white-list you or, even better, point to you to an internal API where you can fetch the interesting data in a much more efficient way than HTML.

[EDIT] I haven't heard of a (legal) technical solution for this. Criminals use huge bot nets with thousands of hacked computers for things like this but I strongly suggest that you stay away from that.

I also haven't yet met a page that only allows three downloads per day. This severe restriction tells me those people are really obsessed with their data. Trying to circumvent their defenses can get you into trouble (no matter how stupid it might look from your side). If they and you are in the US, prepare to be sued for violating the CFAA. This has happened before for lesser reasons.

Now some technical details. You don't say how you connect to the Internet. If you receive your IP via DHCP, then you need to ask your ISP to give a different address. This will be a manual process on their side, so prepare to find little enthusiasm on their side.

Tor sounds like a good solution since Onion routing should emit your request with a different exit node every time. But there is only a (relatively) small number of exit nodes so chances are that after a relatively short amount of time, you will have tried each node three times (this gets worse if other people connect to the same service as you).

[EDIT2] One possible solution might be to become an ISP and officially buy an IP address block (just like any normal ISP does).

IPv6 block shouldn't be that expensive but beware that they won't get you anywhere if the service only works with IPv4! If that's the case, then there will be an IPv6-to-IPv4 bridge between you and the service and it will think you always use the same address.

Trying to buy IPv4 addresses at the moment is probably hopeless (well, maybe you can get a block if you're willing to spend a lot of money).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top