What are good algorithms to keep consistency across multiple files in a network?

https://stackoverflow.com/questions/12831403

06-07-2021
|

题

What are good algorithms to keep consistency in multiple files?

This is a school project. I have to implement in C, some replication across a network.

I have 2 servers,

Server A1 Server A2

Both servers have their own file called "data.txt"

If I write something to one of them, I need the other to be updated.

I also have another scenario, with 3 Servers.

Server B1 Server B2 Server B3

I need these do do pretty much the same.

While this would be fairly simple to implement. If one, or two of the servers were to be down, When comming back up, they would have to update themselves.

I'm sure there are algorithms that solve this efficiently. I know what I want, I just don't know exactly what I'm looking for!

Can someone point me to the right direction please?

Thank you!

解决方案

The fundamental issue here is known as the 'CAP theorem', which defines three properties that a distributed system can have:

Consistency: Reading data from the system always returns the most up-to-date data.
Availability: Every response either succeeds or fails (doesn't just keep waiting until things recover)
Partition tolerance: The system can operate when its servers are unable to communicate with each other (a server being down is one special case of this)

The CAP theorem states that you can only have two of these. If your system is consistent and partition tolerant, then it loses the availability condition - you might have to wait for a partition to heal before you get a response. If you have consistency and availability, you'll have downtime when there's a partition, or enough servers are down. If you have availability and partition tolerance, you might read stale data, or have to deal with conflicting writes.

Note that this applies separately between reads and writes - you can have an Available and Partition-Tolerant system for reads, but Consistent and Available system for writes. This is basically a master-slave system; in a partition, writes might fail (if they're on the wrong side of a partition), but reads will work (although they might return stale data).

So if you want to be Available and Partition Tolerant for reads, one easy option is to just designate one host as the only one that can do writes, and sync from it (eg, using rsync from a cron script or something - in your C project, you'd just copy the file over using some simple network code periodically, and do an extra copy just after modifying it).

If you need partition tolerance for writes, though, it's more complex. You can have two servers that can't talk to each other both doing writes, and later have to figure out what data wins. This basically means you'll need to compare the two versions when syncing and decide what wins. This can just be as simple as 'let the highest timestamp win', or you can use vector clocks as in Dynamo to implement a more complex policy - which is appropriate here depends on your application.

其他提示

Check out rsync and how Dropbox works.

With every write on to server A, fork a process to write the same content to server B. So that all the writes on to server A are replicated on to server B. If you have multiple servers, make the forked process to write across all the backup servers.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow