Question

I'm wondering if there is a way to delete the already decompressed portion of a file as it is being decompressed. I've got an external backup of the compressed file so I'm not worried about loosing data. The file is a bz2. I'm looking to do this because I've only got 50 GB available on the drive and the compressed file is 33 GB. If I can't delete portions of the file while extracting then there won't be enough space for the decompressed file.

There are other things I could do to get around this but I am interested to know if what I mentioned above is possible.

Was it helpful?

Solution

In general, it is not possible to delete the initial portion of a file - you can only truncate a trailing portion of it.

Newer Linux kernels, however, support punching holes into files for specific filesystems, using the fallocate() system call. There is a corresponding utility that can be used for the same purpose, although you need a relatively recent version (2.21 or later) of the util-linux package for hole punching support to be included in that utility.

Keep in mind that hole punching is still relatively new and kernel bugs still pop-up - you might be better off just cleaning up your filesystem to free some space.

OTHER TIPS

If I understand you right, you want to delete the portions at the beginning of a compressed file once they have been read, decompressed and written.

This is generally impossible since under Unix there is no way to delete an initial part of a file without rewriting the rest of it (it is possible to truncate a file from the end without rewriting but that does not solve the problem at hand). File systems with the concept of holes may be an option, though.

However, maybe it is possible for you to create smaller compressed files, like 33 1GB zipped files. Then it is easy to remove the files you have uncompressed already.

The most obvious solution is to write a filter which handles the decompressed output looking for whatever you need in the output.

bunzip2 -c compressedfile.bz2 | yourfilterprogram

-c directs bunzip2 to decompress to stdout.

Using this technique, the uncompressed file is not stored on disk at all.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top