Domanda

I am trying to figure out the file layout of tar.Z file. (so called .taz file. compressed tar file).

this file can be produced with tar -Z option or using unix compress utility(result are same)

I tried to google some document about this file structure but there is no documentation about this file structure.

I know that this is LZW compressed file and starts with its magic number "1F 9D" but thats all I can figure out. someone please tell me more details about the file header or anything.

I am not interested about how to uncompress this file, or what linux command can process this file.

I want to know is internal file structure/header/format/layout. thank you in advance

È stato utile?

Soluzione 2

A tar.Z file is just a compressed tar file, so you will only find the 1F 9D magic number telling you to uncompress it.

When uncompressed you can read the tar file header:

http://www.fileformat.info/format/tar/corion.htm

Altri suggerimenti

A .Z file is compressed using compress and can be uncompressed with uncompress (or on some machines this is called uncompress.real). This .Z file can hold any data. .tar.Z or .taz is just a .tar file that is compressed with compress.

The first 2 bytes (MAGIC_1 and MAGIC_2) are used to check if the .Z file really is a .Z file, and not something else with accidentally the same extension. These bytes are hardcoded in the sources.

The third byte is a settings byte and holds 2 values:

  • The most significant bit is the block mode.
  • The last 5 bits indicate the maximum size of the code table (the code table is used for lzw compression).

From the original code: BLOCK_MODE=0x80; byte3=(BIT|BLOCK_MODE); and BIT is in an if/else block where it is 12..16.

If block mode is turned on, in the code table a entity will be added at place 256 (remember 0..255 are filled with the values 0..255) and this will contain the CLEAR sign. So whenever the CLEAR sign is gotten from the data stream from the file, the code table has to be reverted to it's initial state (so it has only 0..256 in it).

The maximum code size indicates the amount of bits the code table can be. When the maximum is hit, there are no entities added to the code table anymore. So if the maximum code size is 0b00001100, it means that the code table can only hold 12 bits, so a maximum of 2^12=4096 entities.

The highest amount possible that is used by compress is 16 bit. That means that there are 2 bits in this settings field that are unused.

After these 3 bytes the raw LZW data starts. Because the LZW table starts at 9 bits, the 4th byte will be the same as the first byte of the input (in case of a .tar.Z file, or taz file, this byte will be the first byte of the uncompressed .tar file).

Q: this file can be produced with tar -Z option or using unix compress utility(result are same)

A: Yes. "tar -cvf myfile.tar myfiles; compress myfile.tar" is equivalent to using "-Z". An even better choice is often "j" (using BZip, instead of Zip)

Q: What is the layout of a tar file?

A: There are many references, and much freely available source. For example:

Q: What is the format of a Unix compressed file?

A: Again: many references; easy to find sample source code:

Fot a .tgz (compressed tar file) you'll need both formats: you must first uncompress it, then untar it. The "tar" utility will do both for you, automagically :)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top