BASH : Sum size of same name directories

Question 1

One way to do this is with an associative array. An associative array maps a series of keys to values, for example:

directory1 -> 10 GB
directory2 -> 12 MB
directory3 -> 40 KB

The keys in an associative array must be unique. That's great! The paths to our directories are also unique. Let's put them in an associative array. I will show how to do this in awk but plenty of other languages have associative arrays (like Perl, which calls them hashes).

du | awk '{ val = $1; dir = $2; sizes[dir] = val }'

(I took out the arguments you pass to du for simplicity)

What does this do? awk reads the output of du line by line; for each line, it adds an element to the associative array sizes with the directory name as the index and the size as the value. If our original input looked like this

40GB folder1/subfolder1
15GB folder1/subfolder2
10GB folder2/subfolder1

our array would look like this:

sizes[folder1/subfolder1] -> 40GB
sizes[folder1/subfolder2] -> 15GB
sizes[folder2/subfolder1] -> 10GB

But in our final output we just want to see values for the subdirectories. awk has functions for string manipulation, so let's tweak our code to strip off leading directories:

du | awk '{ val = $1; dir = $2; sub(/^.*\//, "", dir); sizes[dir] = val }'

The sub function strips off everything from the last / to the beginning of the path. Now our array looks like this:

sizes[subfolder2] -> 15GB
sizes[subfolder1] -> 10GB

Great! Now we only have values for the subdirectories. There's just one little problem. The values aren't totals. Since we had more than one subdirectory named subfolder1, we overwrote the first value (40GB) with the second one (10GB). When we run into an index that already exists in our array, what we really want to do is add its value to the existing value:

du | awk '{ val = $1; dir = $2; sub(/^.*\//, "", dir); sizes[dir] += val }'

(I changed sizes[dir] = val, which uses assignment, to sizes[dir] += val, which adds val to whatever is already in sizes[dir])

awk magically takes care of some things for us, like converting 15GB to the number 15. Now our array looks like this:

sizes[subfolder2] -> 15
sizes[subfolder1] -> 50

which shows us the totals we're looking for. Now, how do we display this? We can loop through the array and print out the keys and values like this:

du | awk '{ val = $1; dir = $2; sub(/^.*\//, "", dir); sizes[dir] += val } \
          END { for (dir in sizes) print dir, sizes[dir], "GB" }'

and our results are

subfolder1 50 GB
subfolder2 15 GB

EDIT: Here are the results I get using the du output in the updated question.

nr106_0060 2 GB
nr106_0050 3 GB
nr106_0045 2 GB
nr106_0040 2 GB
nr106_0035 2 GB
nr106_0030 2 GB
nr106_0020 2 GB
nr106_0010 2 GB

Question 2

I am not sure how many csv files you will need in the end, but maybe this can help:

 du -h --block-size=GB ./* | awk -F "[: \t/]" '{size[$NF] += $1} END {for (dir in size) print size[dir], dir}' | sort -n -r

The command size[$NF] += $1 sums over the size (first column) storing the result in the associative array indexed by the directory name (last column).