Question

I am implementing a reliable data transfer protocol. The maximum packet length is 1000 bytes. How many bytes do I need to store the checksum for the maximum packet size? I tried using one byte to store the checksum, but but the receiver is unable to obtain the same checksum as the stored one.

Checksum checksum = new CRC32();
checksum.update(out_data, 0, out_data.length-1);

Long checksumValue = checksum.getValue();
out_data[out_data.length-1] =  checksumValue.byteValue();

Here's my code for checksum implementation. I used the last slot of the byte array for the checksum. Note that this is not a case of corrupted packet as the simulation of the unreliable network that I'm running can be made to be reliable.

Was it helpful?

Solution

You can checksum things of any size. Ethernet frames are typically 1500+ bytes and only use 4 bytes for checksums. If you want to be on the safe side, use sha1 (but it takes up 20 bytes and can be slow to calculate).

To put the result of CRC32 into the array, you need 4 bytes (32 bits / 8 bits per byte).

OTHER TIPS

How many bytes of checksum for byte array length 1000

If you are asking how big it should be to detect corruption, that is a really hard question ...

Firstly, no matter how large the checksum, and no matter how it is calculated, you can never completely eliminate the possibility of an undetectable corruption. (Even if there as many checksum values as message values, you cannot eliminate the possibility that transmission will corrupt both the message and the checksum ... in exactly the same way.)

If we average over the set of all possible corruptions, then the best we can hope for is that an N-bit checksum will fail to detect a corruption one time in 2N. If that probability is too high, you need to increase N.

To go past that simple analysis, you have to consider specific checksum algorithms and their ability to detect common kinds of errors; e.g. single bit flips, transpositions of bits, and so on. I don't "know the math" for this, but I imagine it gets really complicated for practical checksumming algorithms, and that an empirical approach is going to give the best answers. But I also imagine that the recommended algorithms approach the theoretical "one time in 2N", and that they don't have any particular weaknesses for common error syndromes.

So, probably the best advice is to figure out what probability your are comfortable with and use that to choose an N ... and an algorithm that gives you checksums of that size (or bigger).


Note that this is not a case of corrupted packet as the simulation of the unreliable network that I'm running can be made to be reliable.

In that case, you could use any checksum algorithm you wanted. Even a one bit checksum; i.e. a parity bit. (Or even zero bits. A 100% reliable network doesn't need checksums.)


... the receiver is unable to obtain the same checksum as the stored one.

Your code as written uses CRC32. That is a 32 bit or 4 byte checksum ... irrespective of the number of bytes of data being checksummed. However, it appears to only be passing the least significant byte of the CRC32 in the data_out message. Unless the receiver is doing the same thing (using only the low byte), the checksums won't match.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top