Question

It is unclear to me, what is a correct .tar file format, as I am experiencing proper functionality with three scenarios (see below).

Based on .tar specification I have been working with, the magic field (ustar) is null-terminated character string and version field is octal number with no trailing nulls.

However I've review several .tar files I found on my server and I found different implementation of magic and version field and all three of them seems to work properly, probably because system ignore those fields.

See different (3) bytes between words ustar and root in the following examples >>

Scenario 1 (20 20 00):

 000000F0      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............
 000000FC      00 00 00 00 | 00 75 73 74 | 61 72 20 20      .....ustar  
 00000108      00 72 6F 6F | 74 00 00 00 | 00 00 00 00      .root.......
 00000114      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............

Scenario 2 (00 20 20):

 000000F0      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............
 000000FC      00 00 00 00 | 00 75 73 74 | 61 72 00 20      .....ustar. 
 00000108      20 72 6F 6F | 74 00 00 00 | 00 00 00 00      root.......
 00000114      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............

Scenario 3 (00 00 00):

 000000F0      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............
 000000FC      00 00 00 00 | 00 75 73 74 | 61 72 00 00      .....ustar..
 00000108      00 72 6F 6F | 74 00 00 00 | 00 00 00 00      .root.......
 00000114      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............

Which one is the correct format?

Was it helpful?

Solution

In my opinion none of your examples is the correct one, at least not for the POSIX format.
As you can read here:

/* tar Header Block, from POSIX 1003.1-1990. */
/* POSIX header */

struct posix_header {   /* byte offset */
  char name[100];               /*   0 */
  char mode[8];                 /* 100 */
  char uid[8];                  /* 108 */
  char gid[8];                  /* 116 */
  char size[12];                /* 124 */
  char mtime[12];               /* 136 */
  char chksum[8];               /* 148 */
  char typeflag;                /* 156 */
  char linkname[100];           /* 157 */
  char magic[6];                /* 257 */
  char version[2];              /* 263 */
  char uname[32];               /* 265 */
  char gname[32];               /* 297 */
  char devmajor[8];             /* 329 */
  char devminor[8];             /* 337 */
  char prefix[155];             /* 345 */
};

#define TMAGIC   "ustar"        /* ustar and a null */
#define TMAGLEN  6
#define TVERSION "00"           /* 00 and no null */
#define TVERSLEN 2

The format of your first example (Scenario 1) seems to be matching with the old GNU header format:

/* OLDGNU_MAGIC uses both magic and version fields, which are contiguous.
   Found in an archive, it indicates an old GNU header format, which will be
   hopefully become obsolescent.  With OLDGNU_MAGIC, uname and gname are
   valid, though the header is not truly POSIX conforming */

#define OLDGNU_MAGIC "ustar  "  /* 7 chars and a null */

In both your second and third examples (Scenario 2 and Scenario 3), the version field is set to an unexpected value (according to the above documentation, the correct value should be 00 ASCII or 0x30 0x30 hex), so this field is most likely ignored.

OTHER TIPS

With Fedora 18 if I execute this command:

tar --format=posix -cvf testPOSIX.tar test.txt

I have a POSIX tar file format with: ustar\0 (0x757374617200)

else if I execute this:

tar --format=gnu -cvf testGNU.tar test.txt

I have a GNU tar file format with: ustar 0x20 0x20 0x00 (0x7573746172202000) (old gnu format)

From /usr/share/magic file:

# POSIX tar archives
257 string      ustar\0     POSIX tar archive
!:mime  application/x-tar # encoding: posix
257 string      ustar\040\040\0 GNU tar archive
!:mime  application/x-tar # encoding: gnu

0x20 is 40 in octal.

I've also tried to edit the hex code with:

00 20 20

and however the tar worked correctly. I've exctract test.txt without problem.

but when I've tried to edit the hex code with:

00 00 00

The tar was not recognized.

So, my conclusion is that the correct format is:

20 20 00
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top