Question

I'm trying to understand how to randomly traverse a file/files in a .tar.gz using TrueZIP in a Java 6 environment( using the Files classes). I found instances where it uses Java 7's Path, however, I can't come up with an example on how to randomly read an archive on Java 6.

Additionally, does "random" reading mean that it first uncompresses the entire archive, or does it read sections in the compressed file? The purpose is that I want to retrieve some basic information from the file without having to uncompress the entire thing just to read it(ie username).

Was it helpful?

Solution

I am not aware of TrueZip in particular, but at least in terms of Zip, RAR and Tar you can access single files and retrieve details about them and even extract them without touching the rest of the package.

Additionally, does "random" reading mean that it first uncompresses the entire archive

If TrueZip follows Zip/RAR/Tar format, then it does not uncompress the entire archive.

The purpose is that I want to retrieve some basic information from the file without having to uncompress the entire thing just to read it(ie username).

As previously, that should be fine -- I don't know TrueZip API in particular, but file container formats allow you to inspect file info without reading a single bit of the data, and optionally extract/read the file contents without touching any other file in the container.

OTHER TIPS

The method that gzip uses to compress a file (especially .tar.gz files) usually implies that the output file is not random-accessible - you need the symbol table and other context from the entire file up to the current block to even be able to uncompress that block to see what's in it. This is one of the ways it achieves (somewhat) better compression over ZIP/pkzip, which compress each file individually before adding them to a container archive, resulting in the ability to seek to a specific file and uncompress just that file.

So, in order to pick a .tar.gz apart, you will need to uncompress the whole thing, either to a temporary file or in memory (if it's not too large), then you can jump to specific entries in the underlying .tar file, although that has to be done sequentially by skipping from header to header, as tar does not include a central index/directory of files.

The source code comment of zran describes how such tools are working: http://svn.ghostscript.com/ghostscript/tags/zlib-1.2.3/examples/zran.c

In conclusion one can say that the complete file has to be processed for generating the necessary index. That is much faster than actually decompress everything. The index allows to split the file into blocks that can be decompressed without having to decompress the blocks before. That is used for emulating random access.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top