Question

i'm using Apache Commons Compress for Java to compress multiple log files to a single tar.bz2 archive.

However, it takes really long (> 12 hours) to compress, because i compress around 20GB of files a day.

As this library compresses files mono-threaded, i'd like to know if there is a way to do this multi-threaded.

I found many solutions (Commandline pbzip2 or some C++ libraries) but all i found for java is this blog post:

https://plus.google.com/117421466255362255970/posts/3jfKVu325zh

It seems that i can't use it in my Java application.

Is there anything out there? What would you recommend? Or is there another faster solution with similar compression rates like bzip2 ?

Was it helpful?

Solution

As you have multiple files, you can compress each file in a different thread. As your process is CPU bound, I suggest creating a fixed size thread pool i.e. an ExecutorService, and adding a task for each file to compress.

Note: if pbzip2 does what you want, I would call it from Java. You might find it is fast for even one thread as the BZIP2 libraries I have seen for Java are natively implemented (unlike JAR, ZIP and GZIP)

OTHER TIPS

If a parallel implementation of bzip2 in Java doesn't exit, you can resort to invoking pbzip2 from within your Java application.

Try at4j implementation of BZip2OutputStream. According to the manual it supports parallel compresion. http://at4j.sourceforge.net/releases/current/pg/ch04.xhtml

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top