What is the metric and why a programmer should care ? - networking [closed]

Question 1

Networking (and in particular inter-networking) is a relatively young science, not yet completely formed. Hence not all the "right behavior" is "shared" by anyone, simply because there are not all believing (or forced) to rein it "right". The consequence is that anyone must -at some level- "care" of anything since it is never possible to have a full trust.

That said, to come close to your problem, let me start by telling you that you are yourself "unstrusted" since you use improper terminology.

You speak about networking "metric" (something, that, in networking science is related to routing) but you talk about something else, that is the MTU (Maximum Transfer Unit). If you speak to a network engineer that way, you almost sure you'll never find the answer to your question, simply because he will most likely understand another completely unrelated thing.

Now that everything is clear (I hope), lets understand a little bit of theory:

Every transmission media, because of its very physics, introduce some errors.
- At the link level this is mainly due to "electric noise" or to group dispersion, that makes the "signal" less and less intelligible.
- At a circuit (or path, for connection-less network protocol, like IP) level this can be due packet loss due to congestion.
The immediate consequence of the above points is that "an infinite end-to-end correct transmission" is not physically possible".
To take care about this, both link protocols as well as transport protocols have to introduce some "redundancy check" (CRC) and some mechanism -in case the checksum fails- to recover the error. In this sense, the MTU is just the maximum number of bytes you can push into a packet without violating the underlying rules about how CRC are computed and the physical media managed.
Depending on the application needs, IP offers different "transport protocols":
- UDP is a "don't care": if a packet gets lost, the transport protocol will do nothing to recover
- TCP is a "Full care, until a given time limit": if something gets lost, a retransission will be asked. All this is managed at the TCP/IP driver level, so the application is not required to take care, unless the "problem" last more than the timeout limit of a session.
Independently on the TCP and UDP behavior, IP offer also a "fragmentation": if a transport unit is too long to fit a link protocol frame, the packed is split into more smaller ones. This process happens (at least in theory) at every hop (every time you traverse a router to go from one link to another and another and another up t the final destination) and requires a complete reconstruction of the packet structure and CRC, hence it require more computational time and CPU power or routers that it will be normally required to just move a packet from port to another.

And this is why you have to take care: network performance are not the same for whatever packet length: longer packets into TCP requires less "waiting for acknowledge" time (so data transfer can flow faster), but longer packets requires longer latency in case of intermediate fragmentation, and more processing power on routers. (at the point that many ISP don't fragment: if it cannot go, they just discard, and let TCP to re-tune to a smaller MTU, or to the application to shorten its UDP packets).

In other word, if you don't care about performance, Just let TCP do do its job, and the data will somehow find a way to flow (by an intermediate fragmentation or by a and-to-end MTU negotiation). But if you want to maximize the performance, the less error-control-and recovery mechanisms you "stimulate" the less latency you get, and hence the more wide "segments" you get and hence the higher data rate you get.

There is an "optimal length" for every path you can have between the two endpoints you have to discover and respect if you want better performance.

If you use UDP, things are a bit worst: since there is no MTU negotiation and recovery, if a too long packet gets discarded along the way (because at a certain point it will no more fit a physical media, and the ISP is not going to fragment, to protect himself and other clients) you have to take care, and reduce its size, otherwise you will never be able to transfer it.

Feel free to retain the ISP is "unfair" to you, but consider that an excess of fragmentation activity can sit a router at the point no other transmission is possible from nowhere to no-one. And this is a damage higher than simply drop you flow.

Question 2

I think you're probably talking about the metric from the routing tables. If so, it's just a number that reflects the 'cost' to the next hop, most probably in terms of latency, but it isn't a direct measurement in milliseconds. It's nothing a programmer needs to be concerned with.

Question 3

Okay with reguard's to optimizing updates with respect to Maximum Transmision Unit Size (MTU) you are going to want to stuff as many bits into a packet as possible to minimize number of packets sent accross the line. It should be noted that if your master machine (one which gets updates first) updates slowly you may decide to not wait for enough data to completely to reach MTU but to rather update the slave machine (machine your backing data up on). So as an example your logic may look something like this (in pseudo code)

while(keep_updating){

    if(new_data_size == (MTU - x)){  //really approximately equal to but smaller then the MTU 
      send_update_packet();

     }else if(count_time > max_wait_time){ // this will have to deal with min latency 
                                           // discussed below
         send_update_packet();
     }
   count_time++;  
}

Now the above examples is very dumb downed if this is something that your actuallly writing I assume it will be at the kernel level (not necessarily but nonetheless) this will more then likely be a single thread while the rest of you service is doing other thing's in each thread.

Now In terms of latency or lag time this is more specifically addressed as network delay something in a p2p situation you most likely don't have control of as it requires specific service agreement's with an ISP. However the sum of these parts will give an approximation of latency in general terms. ie. the time it takes from send_update_packet(); until the slave machine receives the packet and your hook in the kernel/OS where you actual record the new updates. Anyway's these will be the delay that are of interest to you

 Processing delay - time routers take to process the packet header
 Queuing delay - time the packet spends in routing queues
 Transmission delay - time it takes to push the packet's bits onto the link
 Propagation delay - time for a signal to reach its destination

... now as to which protocol you should choose again very dependent if you plan for the two machines to be talking all the time tcp may be your best it will prevent you from atleast ensuring that packet's are not lost (a luxury UDP handles at protocol's higher in the stack). If you expect updates to be more spread apart you will want to avoid TCP simply because of the TCP handshake process of opening a connection which will require 3x(the sum of delay's mentioned above) where as udp will allow you to send the packet's write away and then wait for a response on the same socket ensuring the packet was received correctly if you don't get a responce within time x retry.

As per the answer @EJP provided most of the time programers do not need to worry about this but at some level there are people worrying about this specifically in the cases of disaster management for Database server's and area like this where it is crucial that the DB's are as close to insync at all time's ... non the less that's my two sense after writing some kernel code last year that handle that specific problem Hope it helps!