How to get N computers in a network to start an activity simultaneously?

Question 1

According to http://en.wikipedia.org/wiki/Network_Time_Protocol, "NTP can usually maintain time to within tens of milliseconds over the public Internet, and can achieve better than one millisecond accuracy in local area networks under ideal conditions. Asymmetric routes and network congestion can cause errors of 100 ms or more."

So syncing to within 100ms is feasible with a public NTP server, and is probably your simplest route to getting this working.

Since you're on a LAN, you might be able to achieve better performance by having the master send network messages to the subordinates giving its idea of the current time, or perhaps more simply by running your own NTP server on the LAN and pointing your cluster to it.

Question 2

Short answer - exactly at the same time - is impossible. As Russell wrote, NTP is only accurate within 100ms. Running on a LAN you could measure the communication latency and when sending the tasks, you could add a timeout, when the task should start. When you send it to N nodes and the average latency is L milliseconds the first node would need to wait for (N-1) * Lms before starting, the second (N-2) * Lms ... and so on. The last one would start immediately.

Just to see how complex the issue of synchronised clock is, you can read Google`s research paper on Spanner: Google's Globally-Distributed Database. They use atomic clocks and GPS clocks to be synchronised within nanoseconds within a datacenter and few milliseconds across datacenter.

Question 3

Using an Internet NTP server might, as others have written, cause a too big delay.

I see two options:

Set up a local NTP server and use this as a reference for all clients.
Have your clients listen for UDP packets and fire these as a broadcast/multicast. Then they should start more or less at the same time.

Question 4

Assuming you have decent performance characteristics on the LAN, an alternative solution to NTP would just to be have the master send out a message to the slaves to perform said task upon receipt of the message. If the latency of transmission and processing is within the expected 'reasonably simultaneous' it should fit the bill. This could reasonably be expected to be <10 milliseconds depending on the architectures (interrupt vs polling), system loads, and link quality (wireless).

Additionally, 802.1q VLAN standard defines layer 2 QoS (CoS tags) that you could utilize to prioritize this control traffic on the switch(es).

Perhaps this approach would simplify the application by having a simpler "dumb" slave -- listen and respond to commands -- and then a master with all of the system's logic.