Distributed state

https://stackoverflow.com/questions/9456954

13-11-2019
|

Question

I have a handful of servers all connected over WAN links (moderate bandwidth, higher latency) that all need to be able to share info about connected clients. Each client can connect to any of the servers in the 'mesh'. Im looking for some kind of distributed database each server can host and update. It would be important that each server is able to get updated with the current state if its been offline for any length of time.

If I can't find anything, the alternative will be to pick a server to host a MySQL DB all the servers can insert to; but I'd really like to remove this as a single-point-of-failure if possible. (and the downtime associated with promoting a slave to master)

Is there any no-single-master distributed data store you have used before and would recommend?

It would most useful if any solution has Python interfaces.

Solution

Have you looked at Python's multiprocessing.Manager objects?

You would have to add logic to maintain a distributed database (e.g., choosing new masters, redundancy, and whatever attributes you want) which can easily be done by extending the Manager objects and implementing your own Proxy objects, but the module I mentioned would take care of all the synchronization and data sending for you.

This way, instead of having a distributed database, your mesh would share a Python dict or a complex data type that you instruct your Manager object to share to connected proxies.

OTHER TIPS

Take a look at doozerd project. There is a python client based on gevent.

Maybe Hadoop or something like it would work for you?

http://hadoop.apache.org/

What you describe reminds me of an Apache Cassandra cluster configured so that each machine hosts a copy of the whole dataset and reads and writes succeed when they reach a single node (I never did that, but I think it's possible). Nodes should be able to remain functional when WAN links are down and receive pending updates as soon as they get back on-line. Still, there is no magic - if conflicting updates are issued on different servers or outdated replicas are used to generate new data, consistency problems will arise on any architecture you select.

A second issue is that for every local write, you'll get n-1 remote writes and your servers may spend a lot of time and bandwidth debating who has the latest record.

I strongly suggest you fire up a couple EC2 instances and play with their connectivity to check if everything works the way you expect. This seems to be in the "creative misuse" area and your mileage may vary wildly, if you get any at all.

It would be important that each server is able to get updated with the current state if its been offline for any length of time.

ZooKeeper's ephemeral node allows you to maintain presence information for all server nodes.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow