Question

I'm trying to create a distributed application that requires all the computers in a network to perform an event simultaneously.

Let's say we have two arrays of equal length.

labels[] = ["label1", "label2", "label3", ...]
duration[] = [0.5, 1.2, 0.1, 0.1, 3.0, ...]

I have a master, and N slaves in a LAN. Let's say the arrays have been copied to all the slaves. What I want now, is to run the following code on all of them.

for i in len(labels):
  print labels[i]
  sleep(duration[i])

I need this code to start at exactly the same time on the clients.

How to trigger an 'execute' event in all clients simultaneously? Assuming all clients are synced using the same NTP server, if I ask them to start at a pre-defined time, would the accuracy be reasonable? The duration[] array can have time elements as small as 0.1 seconds, and I would like a reasonable amount of simultaneousness.

Was it helpful?

Solution

According to http://en.wikipedia.org/wiki/Network_Time_Protocol, "NTP can usually maintain time to within tens of milliseconds over the public Internet, and can achieve better than one millisecond accuracy in local area networks under ideal conditions. Asymmetric routes and network congestion can cause errors of 100 ms or more."

So syncing to within 100ms is feasible with a public NTP server, and is probably your simplest route to getting this working.

Since you're on a LAN, you might be able to achieve better performance by having the master send network messages to the subordinates giving its idea of the current time, or perhaps more simply by running your own NTP server on the LAN and pointing your cluster to it.

OTHER TIPS

Short answer - exactly at the same time - is impossible. As Russell wrote, NTP is only accurate within 100ms. Running on a LAN you could measure the communication latency and when sending the tasks, you could add a timeout, when the task should start. When you send it to N nodes and the average latency is L milliseconds the first node would need to wait for (N-1) * Lms before starting, the second (N-2) * Lms ... and so on. The last one would start immediately.


Just to see how complex the issue of synchronised clock is, you can read Google`s research paper on Spanner: Google's Globally-Distributed Database. They use atomic clocks and GPS clocks to be synchronised within nanoseconds within a datacenter and few milliseconds across datacenter.

Using an Internet NTP server might, as others have written, cause a too big delay.

I see two options:

  1. Set up a local NTP server and use this as a reference for all clients.
  2. Have your clients listen for UDP packets and fire these as a broadcast/multicast. Then they should start more or less at the same time.

Assuming you have decent performance characteristics on the LAN, an alternative solution to NTP would just to be have the master send out a message to the slaves to perform said task upon receipt of the message. If the latency of transmission and processing is within the expected 'reasonably simultaneous' it should fit the bill. This could reasonably be expected to be <10 milliseconds depending on the architectures (interrupt vs polling), system loads, and link quality (wireless).

Additionally, 802.1q VLAN standard defines layer 2 QoS (CoS tags) that you could utilize to prioritize this control traffic on the switch(es).

Perhaps this approach would simplify the application by having a simpler "dumb" slave -- listen and respond to commands -- and then a master with all of the system's logic.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top