Question

I'm working on a project were we have number (5 at the moment) of servers spread across the world. Clients connect to one of those servers through a centralized broker. We know the originating country of the client but nothing else. We have full control of the servers so we can have all the info we need on those. We don't control the clients, they have to connect through this broker as according to standard.

It's important that the broker picks a server that has low latency so with the data we have I think proximity is our only available criteria.

The first idea that came in mind is pinging the client from each server but we don't have an ip, only the country.

Other idea we had is to ping a root node in each country from each server. The problem there is finding a root node in each country.

Do you have any idea how to calculate/lookup proximity between "countries"? Do you have any insights or ideas on how to solve this problem in another way?

Was it helpful?

Solution

I think the term you need is "geographic load balancing". Most of the major load balancing vendors have a solution here - your broker could use of these.

Googling geographic load balancing gets some useful looking results.

OTHER TIPS

This is tricky, more so than many imagine, but I feel there is a CORRECT answer.

Of course the naive (but cool) solution is checking the client's IP, this is a good start, but in "the real world" Geolocation isn't everything...

You just asked for "low latency", which means you should do ping between servers and clients and assign accordingly. A very good example of this problem that has affects me personally many times over is that I work in Japan, and a server say in Taiwan is many fold Closer to a Server in the USA for me. BUT, the latency between Japan and the USA is many fold less (better response) than with Taiwan, because the cables and routers and what-have-you that connect Japan-Taiwan are not as good as those between Japan-USA. So if you connected me to Taiwan because you figure my IP is closer, you'd be doing me a very big disfavor there. Besides a ping and actual little test at startup is easier to do than keep some constantly updated Geolocation database

The best solution for this is called BGP anycast (link to a presentation). It's the cornerstone of all modern CDNs.

With BGP anycast, multiple different servers are spread around the world and announced to the Internet via BGP using the same IP. Then the Internet does the magic - as usual, the net routes traffic to that same IP via the shortest path, essentially selecting the nearest server (from a network topology perspective) for every user.

Unfortunately you can't just announce anything via BGP yourself - only large networks (normally datacenters) can do that. But affordable solutions are available, most of which are based on DNS anycast (i.e. resolving to different web server IP based on client location) - this isn't perfect but is sufficient in many cases (examples: dnsmadeeasy, Route 53, edgedirector, and practically every cheap CDN - cloudflare, maxcdn, cloudfront etc). There are also solutions that do true BGP anycast, i.e. actually serve HTTP traffic via anycast (e.g. cachefly) or allow you to do it (e.g. hostvirtual - not cheap). This might also be an interesting read.

Paul has it, you want geographic load balancing -- but I will add, that your best bet, if it is at all an option, is to find somebody who specializes in it and throw money at them. It's in the class of problems that are much harder to reliably solve than it first appears.

Pinging them and picking the one with the lowest latency sounds good, but I have a feeling it won't scale (what happens when you have 100, or 1000?) - so maybe another solution is better? There are lots of vendors out there with systems which do just this; DNS anycast is quite widely used too.

If you did just ping them, you'd need to do several pings to each (ideally in parallel) to ensure that you're picking one with genuinely low latency rather than pot luck.

Also you'd probably want some way to add weights to them ultimately when volumes of traffic are very high.

Finally you'd want some way to mark some of them as administratively down (for maintenance)- but maybe you can do this by having your broken just not advertise ones which aren't currently available.

Pinging won't work. Most of the clients would be behind a gateway and/or firewall, and your ping packets won't get through. Genehack said it best. You need server load balancing, and using just the geographical approach might not be the best way always to go about it. Better to throw money at someone who specializes in providing SLB solutions.

Okay so a few quick thoughts. I was a founder of Digital Envoy - they do geographic IP intelligence. I left the company a few years ago but about 6 years ago we built a joint product with Coyote Point Systems that did exactly this functionality - geographic based load balancing. Sure, there are edge cases (the Taiwan/China example mentioned in this thread) that may not work automatically but the product allowed the user to determine where a country's traffic would go. So if you decided that Taiwan was best served out of the US, it would be pushed that way.

Unfortunately, demand for the solution wasn't as great as we hoped and the product has, I believe, been discontinued. I'd suggest contacting Coyote Point and see if they can provide an equivalent solution for you. If not, I think they will have some ideas about how to go about doing what you want to do.

Another option, depending on what you need to serve, is to use something like Amazon's CloudFront service. Of course, if you need clients to connect to an app and not static files, that won't work for you.

BTW, full disclosure - I'm not only a founder of Digital Envoy but I currently serve on the board of Coyote Point.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top