Measuring one way network latency

https://cs.stackexchange.com/questions/602

16-10-2019
|

Question

This is a puzzle about measuring network latency that I created. I believe the solution is that it's impossible, but friends disagree. I'm looking for convincing explanations either way. (Though it is posed as a puzzle I think it fits on this web site because of its applicability to the design and experience of communication protocols such as in online games, not to mention NTP.)

Suppose two robots are in two rooms, connected by a network with differing one-way latencies as shown in the graphic below. When robot A sends a message to robot B it takes 3 seconds for it to arrive, but when robot B sends a message to robot A it takes 1 second to arrive. The latencies never vary.

The robots are identical and do not have a shared clock, though they can measure the passage of time (e.g. they have stop watches). They do not know which of them is robot A (whose messages are delayed 3s) and which is robot B (whose messages are delayed by 1s).

A protocol to discover the round trip time is:

whenReceive(TICK).then(send TOCK)

// Wait for other other robot to wake up
send READY
await READY
send READY

// Measure RTT
t0 = startStopWatch()
send TICK
await TOCK
t1 = stopStopWatch()
rtt = t1 - t0  //ends up equalling 4 seconds

Is there a protocol to determine the one way trip delays? Can the robots discover which of them has the longer message sending delay?

Two robots one asymmetric network

Solution

The following diagram, from a blog post I wrote, is a visual proof that it's impossible:

Sliding the clock skew exactly offset by latency asymmetry

Notice how the packet arrival times on each side stay the same, even as the one-way latencies change (and even become negative!). The first packet always reaches the server at 1.5s on the server's clock, the second always reaches the client at 2s on the client's clock, etc. The packet contents and local arrival times are the only things a protocol could be based on, but the contents and arrival times can be held constant as the asymmetry varies by also varying the initial clock skew.

Basically, asymmetry in the one-way latencies looks exactly like clock skew. Since the problem states we don't start off knowing the initial clock skew or the one-way latency asymmetry, and varying one looks like varying the other so their effects are indistinguishable, we can't separate their contributions in order to solve for the one-way latency asymmetry. It's impossible.

More formally, you can't solve for edge lengths when given only the lengths of the cycles. The cycle basis has $n-1$ degrees of freedom, corresponding to $n-1$ unknown clock skews relative to one of the participants. You can always hide the one-way latencies, even when there are many participants:

Sea sickness

If you're not so visually inclined, I have another intuitive argument. Imagine a time portal to a hundred years in the future. As you chat across it to someone on the other side, you realize the conversation is totally normal despite the hundred year asymmetry in one-way delays. Any observable effect would have been obvious on that scale!

OTHER TIPS

I think it is impossible to figure out one-way latency just by comparing stopwatches.

$A$ sends $B$ his clock $C_{A1}$, let's say its value is 5.
$B$ acknowledges the message at time $C_{B1}=1$
$A$ sends his clock again, $C_{A2}=9$ (initial + round-trip latency).
$B$ receives at time $C_{B2}=5$.
And so on. Neither $A$ or $B$ can figure out when the other robot received a message with respect to their watches.

Maybe if you make it a bounty question someone will crack it. Until then, kudos.

I have found a way to BOTH discover which node is who (i.e. who has the longer message delay) AND estimate the one way trip delay. While the other answers are correct they are ONLY considering direct clock measurement approached which of course cannot work. However as I am proving here this is only part of the story as here is my working algorithm for the above:

Assume as in real life:

Links of finite bandwidth b
Each node has unique address (e.g. A and B)
Packet size p much smaller than the bandwidth*latency product
Nodes A and B are able to fill the channel
Nodes have a random() function

Each node fills the channel with its own packets (marked A or B respectively) OR forwards the packets it received from other nodes as follows:

Always fill the channel with my own packets except:
if I receive a packet from another node then
   Randomly choose to 
          either forward that packet from the other node
          or discard that packet and forward my own packet

Intuitive Explanation Since A's bandwidth*latency product is higher (because latency is higher) A will manage to have more packets received than B, therefore each Node can know who they are in the diagram.

Furthermore, with enough convergence time of running above algorithm the ratio of packets of A to B will denote the actual ratio of RTT delay of A to B and therefore the desired OTT.

SIMULATION RESULT TRACE Here is a simulation that proves the above and shows how A successfully converging toward 3 second delay and B converging around 1 second delay:

First seconds of Simulation

Subsequent seconds of Simulation

Explanation of Figures: Each line represents 1 second of time (packet size is chosen to have 1 second transmission time for clarity). Note that each node can start the algo at any time not in any particular sequence or time. The columns are as follows:

NODE A receives : What node A sees in its receiving side (this is also P4 below)
NODE A injects : What node A sends out (note this is A, or randomly A or B)
P1, P2, P3 : The three packets that are in transit (in order) between A and B (1 second transmission means 3 packets are in transit for a latency of 3)
NODE B receives : What B sees in its receiving side (this is P3)
NODE B injects : What B sends out (note this is B, or randomly A or B per algo)
P4 : The packet in transit from B to A (see also P1, P2, P3)
A counts A : What A counts for the A packets it has seen
A counts B : What A counts for the B packets it has seen
B counts A : What B counts for the A packets it has seen
B counts B : What B counts for the B packets it has seen
A->B : The latency that A estimates towards B (ratio of RTT of 4 seconds based on packets seen)
B->A : The latency that B estimates towards A (ratio of RTT of 4 seconds based on packets seen)

As we can see both nodes converge and stay around their true latency (actually we do not see that for A because more seconds are needed to converge but it does converge same behavior as B)

Better filters could converge faster but we can clearly see how they both converge around the correct values for their delays, therefore they can exactly know know their delay (even though I am showing their estimation only for illustration).

Also, even if bandwidths between the links are different the above method could still hold (although one will have to think about it more to be more certain) by using packet pairs to figure out bandwidth estimates and then just apply to the proportion equation above.

Conclusion We provided an algorithm for both A and B to know their position in the network and know their latency to the other node for the above diagram. We used a network measurement estimation method rather than clock-based approaches which indeed cannot lead to a solution due to recursive clock sync issue.

Note I now edited this answer providing all the simulations because no one would believe me I solved it so far as you can see in the first comments. Hopefully with these results someone can be more convinced and approve to help everyone at least find one error or correctness in this network measurement puzzle!

This is an answer to @user3134164 but is too big for a comment.

Here's my attempt at showing why this can't work (mathematical approach). Let's call $P_x$ the probability that robot $x$ chooses its own packets when it receives one of the other robot's packets, and $R_x$ the probability that a packet robot $x$ received is its own. After stationary regime has been achieved (that is, after an "infinite" amount of time, ie all convergence is done), we have the following :

$R_1 = (1 - R_2) \times (1 - P_2)$, and similarly $R_2 = (1 - R_1) \times (1 - P_1)$. The idea is that if a robot receives one of its own packets, it means the other robot received it instead of one of its own (hence the $1 - R_2$) and that it still chose to send it instead of one of its own (hence the product by $1 - P_2$).
This gives a system of two equations with two unknowns, $R_1$ and $R_2$. You might want to solve it, but it doesn't really matter. In fact, the ratios you're looking for are respectively $\frac{R_1}{1 - R_1}$ and $\frac{R_2}{1 - R_2}$. As you can notice, the only terms that appear in those expressions are the probabilities at which each robot chooses their own packet over the other's. The latencies don't even appear in the formula simply because both robots are pumping out packets constantly, so both are constantly receiving packets. They will indeed receive different ratios of packets, but it will only depend on the aforementionned probabilities.

This is why I believe this will lead you nowhere. Please do point out any mistake I could have made during this reasoning.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange