質問

I need to write a couple of floats to a text file and store a CRC32 checksum with them. Then when I read the floats back from the text file, I want to recompute the checksum and compare it to the one that was previously computed when saving the file. My problem is that the checksum sometimes fails. This is due to the fact that equal floating point numbers can be represented by different bit patterns. For completeness' sake, I will summarize the code in the next paragraphs.

I have adapted this CRC32 algorithm which I found after reading this question. Here's what it looks like:

uint32_t updC32(uint32_t octet, uint32_t crc) {
    return CRC32Tab[(crc ^ octet) & 0xFF] ^ (crc >> 8);
}

template <typename T>
uint32_t updateCRC32(T s, uint32_t crc) {
    const char* buf = reinterpret_cast<const char*>(&s);
    size_t len = sizeof(T);

    for (; len; --len, ++buf)
        crc = updC32(static_cast<uint32_t>(*buf), crc);
    return crc;
}

CRC32Tab contains exactly the same values as the large array in the file linked above.

This is an abbreviated version of how I write the floats to a file and compute the checksum:

float x, y, z;

// set them to some values

uint32_t crc = 0xFFFFFFFF;
crc = Utility::updateCRC32(x, crc);
crc = Utility::updateCRC32(y, crc);
crc = Utility::updateCRC32(z, crc);
const uint32_t actualCrc = ~crc;

// stream is a FILE pointer, and I don't mind the scientific representation
fprintf(stream, " ( %g %g %g )", x, y, z);
fprintf(stream, " CRC %u\n", actualCrc);

I read the values back from the file as follows. There is actually a lot more involved as the file has a more complex syntax and has to be parsed, but let's assume that getNextFloat() returns the textual representation of each float written before.

float x = std::atof(getNextFloat());
float y = std::atof(getNextFloat());
float z = std::atof(getNextFloat());

uint32_t crc = 0xFFFFFFFF;
crc = Utility::updateCRC32(x, crc);
crc = Utility::updateCRC32(y, crc);
crc = Utility::updateCRC32(z, crc);
const uint32_t actualCrc = ~crc;

const uint32_t fileCrc = // read the CRC from the file
assert(fileCrc == actualCrc); // fails often, but not always

The source of this problem to be that std::atof will return a different bit representation of the float encoded in the string which was read from the file than the bit representation of the float that was used to write that string to the file.

So, my question is: Is there another way to achieve my goal of checksumming floats which are roundtripped through a textual representation other than to checksum the strings themselves?

Thanks for reading!

役に立ちましたか?

解決

The source of the issue is apparent from your comment:

If I'm not completely mistaken, there is no rounding happening here. The %g specifier chooses the shortest string representation that exactly represents the number.

This is incorrect. If no precision is specified, it defaults to 6, and rounding will definitely occur for most floating-point inputs.

If you need a human-readable round-trippable format, %a is by far the best-choice. Failing that, you will need to specify a precision of at least 9 (assuming that float on your system is IEEE-754 single precision).

You may still be tripped up by NaN encodings, since the standard does not specify how or if they must be printed.

他のヒント

If the text file doesn't have to be human-readable, use hexadecimal float literals instead, they are exact so you won't have this problem of differences between textual and in-memory values.

If your standard library's float-to-text and text-to-float conversions do proper rounding, you just need enough sigificant digits for the float->text->float roundtrip to be lossless unless you also have Infs and NaNs, still it should be "value-preserving", not necessarily bitpattern preserving since there are multiple representations for infinity or NaN, I think. For an IEEE-754 64 bit double 17 significant digits is just enough to make the roundtrip lossless with respect to the actual value.

Your CRC algorithm is flawed for any type which has multiple binary representations for a single value. IEEE 754 has two representations for 0.0, to wit +0.0 and -0.0. Other, non-finite values such as NaN are potentially troublesome too.

Would it be acceptable to canonicalize your numbers before you update the CRC? So while saving, you would get a temporary string version of your number (with sprintf or whatever matches your serialization's format), then convert this string back to a numeric value, and then use this result to update the CRC. This way, you know that the CRC will match the deserialized value.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top