Question

I'm creating data dumps from my site for others to download and analyze. Each dump will be a giant XML file.

I'm trying to figure out the best compression algorithm that:

  • Compresses efficiently (CPU-wise)
  • Makes the smallest possible file
  • Is fairly common

I know the basics of compression, but haven't a clue as to which algo fits the bill. I'll be using MySQL and Python to generate the dump, so I'll need something with a good python library.

Was it helpful?

Solution

GZIP with standard compression level should be fine for most cases. Higher compression levels=more CPU time. BZ2 is packing better but is also slower. Well, there is always a trade-off between CPU consumption/running time and compression efficiency...all compressions with default compression levels should be fine.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top