سؤال

When comparing the size of a directory with Unix and python, I have slightly different results (5% smaller with "disk usage"). Why ? (all my subfolders are readable; I work under Mac OSX Mountain lion, python version 2.7.2)

Here is my code :

import os, sys
from commands import getstatusoutput

def get_size(start_path = '.'):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(start_path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size/1024

def get_size2(start_path = '.'):
    cmd = "du -ks "+start_path    # result in blocks of 1024 bytes
    code_err, output = getstatusoutput(cmd)
    return int(output.split()[0])

print get_size()
# 306789
print get_size2()
# 321328

Thanking you by advance for your answers,

Eric.

هل كانت مفيدة؟

المحلول

In general, du gives you the amount of storage the data is occupying on the disk while a lot of other ways to measure the data will give you the size of the data.

Why is this not the same?

  • Sometimes data can be stored very efficiently and needs less space on storage than its original size. This can happen if you have got sparse files or hard links. While these two are common on Unix*ish file systems there might be other things, depending on the weirdness of your file systems.
  • Sometimes data needs more space on disk than it has originally in size. This is rather normal due to the fact that all file systems order their file data in blocks and data doesn't always come in multiples of the block size. This means that some part of the last block is typically wasted (i. e. occupied but not used).
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top