Question

I am running some web crawling jobs on an AWS hosted server. The crawler scrapes data from an eCommerce website but recently the crawler gets "timeout errors" from the website. The website might have limited my visiting frequency based on my IP address. Allocating a new Elastic-IP address solves the problem, but not for long.

My Question: Is there any service that I can use to automatically and dynamically allocate & associate new IPs to my instance? Thanks!

Was it helpful?

Solution

To change the EIP you can just use Python boto

Something like this:

#!/usr/bin/python

import boto.ec2

conn = boto.ec2.connect_to_region("us-east-1",
    aws_access_key_id='<key>',
    aws_secret_access_key='<secret>')


reservations = ec2_conn.get_all_instances(filters={'instance-id' : 'i-xxxxxxxx'})
instance = reservations[0].instances[0]

old_address = instance.ip_address
new_address = conn.allocate_address().public_ip

conn.disassociate_address(old_address)
conn.associate_address('i-xxxxxxxx', new_address)

OTHER TIPS

If you want use TOR network just execute:

sudo apt-get install tor 
sudo /etc/init.d/tor start

 netstat -ant | grep 9050 #  Tor port

and in your java project you set the proxy as:

public static void main(String[] args) {
    System.setProperty("socksProxyHost", "127.0.0.1");
    System.setProperty("socksProxyPort", "9050");

you can scheduler a cron job that each XX time reboot your application and tor.

Easy and secure.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top