Question

I have a general conceptual question about endianness and how it affects tcp socket communication with C/C++. Here's an example:

You have two servers that are communicating with tcp sockets and one uses big endian and the other little endian. If you send an integer, over the socket, from one server to the other, I understand that the byte ordering is reversed and the integer will not print what is expected. Correct? I saw somewhere (I can't find where anymore) that if you send a char over the socket endianness doesn't change the value and it prints as expected. Is this correct? If so, why? I feel like I've done this before in the past, but I could be delusional.

Could anybody clear this up for me?

Thanks.

Edit: Is it because char is only 1 byte?

Was it helpful?

Solution

Think about the size of each data type.

An integer is typically four bytes, which you can think of as four individual bytes side by side. The endianness of an architecture determines whether the most significant byte is the first of the four bytes, or the last. A char, however, is only one byte. As I understand it, endianness does not affect the order of the bits in each byte (see the image on Wikipedia's page on Endianness).

A char, however, is only one byte, so there's no alternative order (assuming that I am correct that bits are not modified by endianness).

If you send a char over a socket, it will be one byte on both machines. If you send an int over a socket, since it's four bytes, it's possible that one machine will interpret the bytes in a different order than the other, according to the endianness. You should set up a simple way to test this and get back with some results!

OTHER TIPS

The only thing you can send over a TCP socket is bytes. You cannot send an integer over a TCP socket without first creating some byte representation for that integer. The C/C++ type, integer, can be stored in memory in whatever way the platform likes. If that just happens to be the form in which you need to send it over the TCP socket, then fine. But if it's not, then you have to convert into the form the protocol requires before you send it and into your native format after you receive it.

As a bit of a sloppy analogy, consider they way I communicate with you. My native language might be Spanish and who knows what goes on in my brain. Internally, I might represent the number three as "tres" or some weird pattern of neurons. Who knows? But when I communicate with you, I must represent the number three as "3" or "three" because that's the protocol you and I have agreed to, the English language. So unless I'm a terrible English speaker, how I internally store the number three won't affect my communication with you.

Since this group requires me to produce streams of English characters to talk to you, I must convert my internal number representations to streams of English characters. Unless I'm terrible at doing that, how I store numbers internally will not affect the streams of English characters I produce.

So unless you do foolish things, this will never matter. Since you will be sending and receiving bytes over the TCP socket, the memory format of the integer type won't matter, since you won't be sending or receiving instances of the C/C++ integer type but logical integers.

For example, if the protocol specification for the data you are sending over TCP says that you need to send a four-byte integer in little-endian format, then you should write code to do that. If the code takes your platform's endianness into consideration, that would be purely as an optimization that should not affect code behavior.

You have two servers that are communicating with tcp sockets and one uses big endian and the other little endian. If you send an integer, over the socket, from one server to the other, I understand that the byte ordering is reversed and the integer will not print what is expected.

This is a very well known problem in network communication protocols. The correct answer is to not send any integer.

You define the protocol very specified to contain, as example a 32 bit signed integer stored in big-endian ordering. Big-endian happens to be what is mostly used in network protocols.

Inside the computers you want to use, say signed long. The C standard defines unsigned long to have a minimum range. The actual storage may be very different. It would be at least 32 bits but could be more.

On the platform where you compile your code there will be macros allowing you to translate between the "internal" integer and the network 32 bit signed big-endian in the network. Examples are htonl() and ntohl(). These macros will become different code depending on which platform you are compiling for.

Byte endianness refers to the order of individual bytes in a data type of more than 1 byte (such as short, int, long, etc.)

So your assumption is correct for int (since it must be a least 16 bits, usually more nowadays). It is also often correct for char since they are usually 1 byte. But you could have chars with more than 8bits, in which case endianness matters.

It does not matter so long as you are transferring only bytes. And you should be transferring only bytes in standard networking.strong text

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top