Question

Working on a system that was created with a macOS RAID array with 256k chunk size for the members. The drive was originally to be used for video and image editing and storage, but has now become a multipurpose drive that has a lot of smaller files on it. How can I determine the amount of wasted space on the drive that might be caused by this large chunk size?

If it is too considerable I believe I'll move these files to another drive and recreate the array with smaller chunk size commensurate with the usage now.

Was it helpful?

Solution

The chunk size of your RAID array does not determine how much space on disk a single file uses. Therefore no space is actually wasted due to having a larger chunk size than optimal.

The amount of space wasted is instead determined by the file system block size, which is independent of the RAID array chunk size. On macOS, you're typically looking at APFS, which uses 4096 byte blocks - or HFS+ which uses 512 byte sectors that are typically grouped together in allocation blocks of 4096 bytes (unless you have a RAID drive that is more than 16 TB, then it is larger).

You can determine your allocation block size by running this command in the Terminal (change the device node to match your disk setup):

diskutil info /dev/disk2s1

Unfortunately lots of "myths" and wrong information has circulated regarding RAID chunk sizes, as it has been seen as a form of "dark arts" to choose the right size. It is essentially hard to choose the optimal chunk size from a long list of options without actually benchmarking with the actual data and operations done on them.

However, in your case you actually have the type of setup you want. If you have many small files, you actually want a big chunk size on your RAID. If you have fewer, large files, you want a small chunk size on your RAID.

Unfortunately some have heard the opposite advice here. That comes from the fact that if you have a single disk, you want the opposite - i.e. for storing few, large files you want big blocks, and for storing many, smaller files, you want small blocks. This is because you want to minimize the number of block operations per second with large files to optimize throughput, whereas for smaller files, you want to optimize for latency by having smaller blocks and thus more operations per second.

However, on a RAID-system with many disks - things are ofcourse different. When dealing with large files, you want to distribute the workload evenly over many drives to optimize performance. This means relatively small chunks so that you can get many drives working for you at once - each with their own small chunk. On the other hand, when you're dealing with small files, you want to ensure that most operations can be completed by a single drive only, so you get the lowest latency possible. This means a large chunk size to ensure that your data is contained in a single chunk that can be processed by a single disk.

OTHER TIPS

How can I determine the amount of wasted space on the drive that might be caused by this large block size?

This is actually quite a difficult exercise and there's a whole discipline dedicated to architechting storage solutions. An answer here simply cannot do it justice. However, the main issue is this condition:

The drive was originally to be used for video and image editing and storage, but has now become a multipurpose drive that has a lot of smaller files on it.

This means that the size of your files can vary wildly making an accurate forecast darn near impossible. Not only do you have to know how the file sizes vary, you have to know how they vary over time as well. There has to be an understanding of all the "things" that generate these blocks: the OS, the filesystem, the applications, they array type, even the hardware contribute to these factors.

For an excellent perspective on this, HPE (Hewlett-Packard Enterprise) has an excellent write up: Busting the Myth of Storage Block Size


I don't know of any tools for the Mac (and I haven't done this type of exercise in a long, long, long time). However, I did use tools like Sun Microsystems' Swat (Sun StorageTek Workload Analysis Tool). Storage vendors will typically include this type of tool as a value add to their storage solution.

To get this type of analysis on the Mac, you'll definitely need something similar to assist you in analyzing how your storage is allocated.

Licensed under: CC-BY-SA with attribution
Not affiliated with apple.stackexchange
scroll top