One way to do this is with an associative array. An associative array maps a series of keys to values, for example:
directory1 -> 10 GB
directory2 -> 12 MB
directory3 -> 40 KB
The keys in an associative array must be unique. That's great! The paths to our directories are also unique. Let's put them in an associative array. I will show how to do this in awk
but plenty of other languages have associative arrays (like Perl, which calls them hashes).
du | awk '{ val = $1; dir = $2; sizes[dir] = val }'
(I took out the arguments you pass to du
for simplicity)
What does this do? awk
reads the output of du
line by line; for each line, it adds an element to the associative array sizes
with the directory name as the index and the size as the value. If our original input looked like this
40GB folder1/subfolder1
15GB folder1/subfolder2
10GB folder2/subfolder1
our array would look like this:
sizes[folder1/subfolder1] -> 40GB
sizes[folder1/subfolder2] -> 15GB
sizes[folder2/subfolder1] -> 10GB
But in our final output we just want to see values for the subdirectories. awk
has functions for string manipulation, so let's tweak our code to strip off leading directories:
du | awk '{ val = $1; dir = $2; sub(/^.*\//, "", dir); sizes[dir] = val }'
The sub
function strips off everything from the last /
to the beginning of the path. Now our array looks like this:
sizes[subfolder2] -> 15GB
sizes[subfolder1] -> 10GB
Great! Now we only have values for the subdirectories. There's just one little problem. The values aren't totals. Since we had more than one subdirectory named subfolder1
, we overwrote the first value (40GB) with the second one (10GB). When we run into an index that already exists in our array, what we really want to do is add its value to the existing value:
du | awk '{ val = $1; dir = $2; sub(/^.*\//, "", dir); sizes[dir] += val }'
(I changed sizes[dir] = val
, which uses assignment, to sizes[dir] += val
, which adds val
to whatever is already in sizes[dir]
)
awk
magically takes care of some things for us, like converting 15GB to the number 15. Now our array looks like this:
sizes[subfolder2] -> 15
sizes[subfolder1] -> 50
which shows us the totals we're looking for. Now, how do we display this? We can loop through the array and print out the keys and values like this:
du | awk '{ val = $1; dir = $2; sub(/^.*\//, "", dir); sizes[dir] += val } \
END { for (dir in sizes) print dir, sizes[dir], "GB" }'
and our results are
subfolder1 50 GB
subfolder2 15 GB
EDIT: Here are the results I get using the du
output in the updated question.
nr106_0060 2 GB
nr106_0050 3 GB
nr106_0045 2 GB
nr106_0040 2 GB
nr106_0035 2 GB
nr106_0030 2 GB
nr106_0020 2 GB
nr106_0010 2 GB