Question

I'm hoping someone here can answer these definitively:

  • Does putting a VHD file in an NTFS-compressed folder on the host improve performance of the virtual machine, diminish performance, or neither?
  • What about using NTFS compression within the guest?
  • Does using compresssion on either the host or the guest lead to any problems like read or write errors?
  • If I were to put a VHD in a compressed folder on the host, would I benefit from compacting it?

I've seen references to using NTFS compression on quite a few VPC "tips and tricks" blog posts, and it seems like half of them say to never do it and the other half say that not only does it save disk space but it actually can improve performance if you have a fast CPU and your primary performance bottleneck is the disk.

Was it helpful?

Solution

This is all about tradeoffs. It takes the disk a certain amount of time to read a certain number of bytes. If you can compress the data such that the sum of the reduced amount of time it takes the disk to read it combined with the time it takes the CPU to decompress it is less than it took to read the uncompressed data from the disk then you win perf.

The problem is that there are so many variables in this, and it is likely to be quite fine either way. Your disk could read small blocks slower, or your data could not be very compressible, or your CPU might be really fast, or (and so on and so on). The only way to know if it makes a difference for sure is to try it and measure it. The answers you get for different data/machines is likely to be different.

OTHER TIPS

You won't see disk errors, but you may see slowdowns over time, or during intensive disk writing by the VM. The reason you wouldn't compress a VM in place is the same you wouldn't compress a database in place: the virtual disk is a block device, and the VM addresses it using block offsets. It assumes all blocks to be the same size. If they are compressed, that is no longer true. The host file system has to do the translation between the assumed block location, and the real (compressed) block location. That would be fairly trivial overhead if the thing were read-only, but blocks change, and so does their compressibility. Rewriting a compressed block may mean that it doesn't fit where it was. The host FS will have to move it, which is an extra step, and which fragments the virtual disk.

Of course, there is always fragmentation when your VM writes to new locations that don't yet have a physical location. You can only beat this by using (uncompressed) full size virtual disks (full of empty padding, i.e. no auto-grow), and defragging both host and guest.

On solid state storage, fragmentation doesn't matter, and compression will help reduce writes, which lengthens the life of your disk. But you are still stuck with the CPU and memory overhead of block translation.

Also bear in mind that your disk controller may be deduping and or compressing data, so your OS level efforts may be redundant.

For VMs that are largely reading, compression may turn out to be worthwhile. Unfortunately, modern OSs do so much housekeeping, logging and self-updating that they write continually, but you can confine this activity to a snapshot. So compress the base image in place, but not the snapshots. Write performance will be unaffected. Caveat: snapshots can easily grow to the size of the original disk. You will still need to merge or delete them frequently, so put them where you can see them.

Conclusion: keep it simple.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top