Pergunta

I'm currently running a REST API app on two EC2 nodes under a single load balancer. Rather than the standard load-balancing scenario of small amounts of traffic coming from many IPs, I get huge amounts of traffic from only a few IPs. Therefore, I'd like requests from each individual IP to be spread among all available nodes.

Even with session stickiness turned off, however, this doesn't appear to be the case. Looking at my logs, almost all requests are going to one server, with my smallest client going to the secondary node. This is detrimental, as requests to my service can last up to 30 seconds and losing that primary node would mean a disproportionate amount of requests get killed.

How can I instruct my ELB to round-robin for each client's individual requests?

Foi útil?

Solução

You cannot. ELB uses a non-configurable round-robin algorithm. What you can do to mitigate (and not solve) this problem is adding additional servers to your ELB and/or making the health check requests initiated by your ELB more frequent.

Outras dicas

I understand where you're coming from. However, I think you should approach the problem from a different angle. Your problem it appears isn't specifically related to the fact that the load is not balanced. Lets say you do get this balancing problem solved. You're still going to loose a large amount of requests. I don't know how you're clients connect to your services so I can't go into details on how you might fix the problem, but you may want to look at improving the code to be more robust and plan for the connection to get dropped. No service that has connections of 30+ seconds should rely on the connection not getting dropped. Back in the days of TCP/UDP sockets there was a lot more work done on building for failures, somehow that's gotten lost in today's HTTP world.

What I'm trying to say, is if you write the code you're clients are using to connect, build the code to be more robust and handle failures with retries. Once you start performing retries you'll need to make sure that your API calls are atomic and use transactions where necessary.

Lastly, I'll answer your original question. Amazon's ELB's are round robin even from the same computer / ip address. If your clients are always connecting to the same server its most likely the browser or code that is caching the response. If they're not directly accessing your REST API from a browser most languages allow you to get a list of ip's for a given host name. Those ip's will be the ip's of the loadbalancers and you can just shuffle the list and then use the top entry each time. For example you could use the following PHP code to randomly send requests to a different load balancer.

public function getHostByName($domain) {
    $ips = gethostbynamel($domain);
    shuffle($ips);
    return $ips[0];
}

I have had similar issues with Amazon ELB however for me it turned out that the HTTP client used Connection: keep-alive. In other words, the requests from the same client was served over the same connection and for that reason it did not switch between the servers. I don't know which server you use but it is probably possible to turn off keep-alive forcing the client to make a new connection for every request. This might be a good solution for requests with a lot of data. If you have a large amount of requests with small data it might affect performance negatively.

This may happen when you have the two instances in different availability zones.

When one ELB is working with multiple instances in a single availability zone, it will round-robin the requests between the instances.

When two instances are in two different availability zones, the way ELB works is create two servers (elb servers) each with its own IP, and they balance the load with DNS.

When your client asks the DNS for the IP address of your server, it receives two (or more) responses. Then the client chooses one IP and caches that (the OS usually does). Not much you can do about this, unless you control the clients.

When your problem is that the two instances are in different availability zones, the solution might be to have at least two instances in each availability zone. Then one single ELB server will handle the round-robin across two servers and will have just one IP so when a server fails it will be transparent to the clients.

PS: Another case when ELBs create more servers with unique IPs is when you have a lot of servers in a single availability zone, and one single ELB server can't handle all the load and distribute it to connected servers. Then again, a new server is created to connect the extra instances and the load is distributed using DNS and multiple IPs.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top