Python CRC-32 Woes.

https://stackoverflow.com/questions/5047494

15-11-2019
|

質問

6 GBのBZ2ファイルの途中からデータを抽出するためのPythonプログラムを書いています。 BZIP2ファイルは独立して復号化可能なデータブロックで構成されているので、ブロックを見つけるだけでよい（マジックビットで区切られています）、次にメモリ内の一時的な1ブロックのBZIP2ファイルを作成し、最後にそれを渡します。 BZ2.Decompress関数簡単、いいえ？

bzip2 フォーマット最後にファイルのCRC32チェックサムを持っています。問題ありません、Binascii.crc32は救助に。ちょっと待って。チェックサムされるデータは必ずしもバイト境界で終わるわけではなく、CRC32関数は整数バイト数で動作します。

マイプラン：最後のバイト以外のBINASCII.CRC32関数を使用してから、最後の1~7ビットで計算されたCRCを更新するための私自身の関数を使用します。しかし、コーディングとテストの時間は私を当惑させました、そして私の任命はこの質問に煮ることができます：CRC32（ "\ x00"）が0x00000000ではないのですか？ Wikipediaの記事によると、そうではないですか？

0B00000000と32 0のパッドから始めて、すぐに最初の8ビットに残っているものが残っているまで0x04C11DB7で多項式除算します。最後の32ビットはチェックサムです、そしてそれはすべてゼロになれないのですか？

Googleを答えに検索し、これがそうである理由の手がかりを見つけることなく、いくつかのCRC-32実装のコードを検索しました。

解決

how come crc32("\x00") is not 0x00000000?

The basic CRC algorithm is to treat the input message as a polynomial in GF(2), divide by the fixed CRC polynomial, and use the polynomial remainder as the resulting hash.

CRC-32 makes a number of modifications on the basic algorithm:

The bits in each byte of the message is reversed. For example, the byte 0x01 is treated as the polynomial x^7, not as the polynomial x^0.
The message is padded with 32 zeros on the right side.
The first 4 bytes of this reversed and padded message is XOR'd with 0xFFFFFFFF.
The remainder polynomial is reversed.
The remainder polynomial is XOR'd with 0xFFFFFFFF.
And recall that the CRC-32 polynomial, in non-reversed form, is 0x104C11DB7.

Let's work out the CRC-32 of the one-byte string 0x00:

Message: 0x00
Reversed: 0x00
Padded: 0x00 00 00 00 00
XOR'd: 0xFF FF FF FF 00
Remainder when divided by 0x104C11DB7: 0x4E 08 BF B4
XOR'd: 0xB1 F7 40 4B
Reversed: 0xD2 02 EF 8D

And there you have it: The CRC-32 of 0x00 is 0xD202EF8D.
(You should verify this.)

他のヒント

In addition to the one-shot decompress function, the bz2 module also contains a class BZ2Decompressor that decompresses data as it is fed to the decompress method. It therefore does not care about the end-of-file checksum and provides the data needed once it reaches the end of the block.

To illustrate, assume I have located the block I wish to extract from the file and stored it in a bitarray.bitarray instance (other bit-twiddling modules will probably work as well). Then this function will decode it:

def bunzip2_block(block):
    from bz2 import BZ2Decompressor
    from bitarray import bitarray

    dummy_file = bitarray(endian="big")
    dummy_file.frombytes("BZh9")
    dummy_file += block

    decompressor = BZ2Decompressor()
    return decompressor.decompress(dummy_file.tobytes())

Note that the frombytes and tobytes methods of bitarray were previously called fromstring and tostring.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow