Question

I am using the shutil.disk_usage() function to find the current disk usage of a particular path (amount available, used, etc.). As far as I can find, this is a wrapper around os.statvfs() calls. I'm finding that it is not giving the answers I'd expect, as comparing to the output of "du" in Linux.

I have obscured some of the paths below for company privacy reasons, but the output and code are otherwise undoctored. I am using Python 3.3.2 64-bit version.

#!/apps/python/3.3.2_64bit/bin/python3

# test of shutils.diskusage module
import shutil

BytesPerGB = 1024 * 1024 * 1024

(total, used, free) = shutil.disk_usage("/data/foo/")
print ("Total: %.2fGB" % (float(total)/BytesPerGB))
print ("Used:  %.2fGB" % (float(used)/BytesPerGB))

(total1, used1, free1) = shutil.disk_usage("/data/foo/utils/")
print ("Total: %.2fGB" % (float(total1)/BytesPerGB))
print ("Used:  %.2fGB" % (float(used1)/BytesPerGB))

Which outputs:

/data/foo/drivecode/me % disk_usage_test.py
Total: 609.60GB
Used:  291.58GB
Total: 609.60GB
Used:  291.58GB

As you can see, the main problem is I would expect the second amount for "Used" to be much smaller, as it is a subset of the first directory.

/data/foo/drivecode/me % du -sh /data/foo/utils
2.0G    /data/foo/utils

As much as I trust "du," I find it hard to believe the Python module would be incorrect either. So perhaps it is just my understanding of Linux filesystems that could be the issue. :)

I wrote a module (based heavily on someone's code here at SO) which recursively gets the disk_usage, which I was using until now. It appears to match the "du" output but is MUCH, much slower than the shutil.disk_usage() function, so I'm hoping I can make that one work.

Thanks much in advance.

Était-ce utile?

La solution

The problem is that shutil uses the statvfs system call underneath to determine the space used. This system call has no file-path granularity as far as I'm aware, only file-system granularity. What this means is that the path you provide it with only helps to identify the file system you want to query, not the path's.

In other words, you gave it the path /data/foo/utils and then it determined which file system backs this file path. Then it queried the file system. This becomes apparent when you consider how the used parameter is defined in shutil:

used = (st.f_blocks - st.f_bfree) * st.f_frsize

Where:

fsblkcnt_t     f_blocks;   /* size of fs in f_frsize units */
fsblkcnt_t     f_bfree;    /* # free blocks */
unsigned long  f_frsize;   /* fragment size */

This is why it's giving you the total space used on the entire file system.

Indeed, it seems to me like the du command itself also traverses the file structure and adds up the file sizes. Here is GNU coreutils du command's source code.

Autres conseils

The shutil.disk_usage returns the disk usage (i.e. the mount point which backs the path) and not actual file usage under that path. It is equivalent of running df /path/to/mount and not du /path/to/files. Notice that for both directories you got the exact same usage.

From the docs: "Return disk usage statistics about the given path as a named tuple with the attributes total, used and free, which are the amount of total, used and free space, in bytes."

Update for anyone stumbling upon this after 2013:


Depending on your Python version and OS, shutil.disk_usage might support files and directories for the path variable. Here's the breakdown:

Windows:

  • 3.3 - 3.5: only suports mountpoint/filesystem
  • 3.6 - 3.7: directory support
  • 3.8+: file & directory support

Unix:

  • 3.3 - 3.5: only suports mountpoint/filesystem
  • 3.6+: file & directory support
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top