Question

When comparing the size of a directory with Unix and python, I have slightly different results (5% smaller with "disk usage"). Why ? (all my subfolders are readable; I work under Mac OSX Mountain lion, python version 2.7.2)

Here is my code :

import os, sys
from commands import getstatusoutput

def get_size(start_path = '.'):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(start_path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size/1024

def get_size2(start_path = '.'):
    cmd = "du -ks "+start_path    # result in blocks of 1024 bytes
    code_err, output = getstatusoutput(cmd)
    return int(output.split()[0])

print get_size()
# 306789
print get_size2()
# 321328

Thanking you by advance for your answers,

Eric.

Was it helpful?

Solution

In general, du gives you the amount of storage the data is occupying on the disk while a lot of other ways to measure the data will give you the size of the data.

Why is this not the same?

  • Sometimes data can be stored very efficiently and needs less space on storage than its original size. This can happen if you have got sparse files or hard links. While these two are common on Unix*ish file systems there might be other things, depending on the weirdness of your file systems.
  • Sometimes data needs more space on disk than it has originally in size. This is rather normal due to the fact that all file systems order their file data in blocks and data doesn't always come in multiples of the block size. This means that some part of the last block is typically wasted (i. e. occupied but not used).
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top