Question

I am looking for a fast way to find the number of files in a directory on Linux.

Any solution that takes linear time in the number of files in the directory is NOT acceptable (e.g. "ls | wc -l" and similar things) because it would take a prohibitively long amount of time (there are tens or maybe hundreds of millions of files in the directory).

I'm sure the number of files in the directory must be stored as a simple number somewhere in the filesystem structure (inode perhaps?), as part of the data structure used to store the directory entries - how can I get to this number?

Edit: The filesystem is ext3. If there is no portable way of doing this, I am willing to do something specific to ext3.

Was it helpful?

Solution

Why should the data structure contain the number? A tree doesn't need to know its size in O(1), unless it's a requirement (and providing that, could require more locking and possibly a performance bottleneck)

By tree I don't mean including subdir contents, but files with -maxdepth 1 -- supposing they are not really stored as a list..

edit: ext2 stored them as a linked list.

modern ext3 implements hashed B-Trees

Having said that, /bin/ls does a lot more than counting, and actually scans all the inodes. Write your own C program or script using opendir() and readdir().

from here:

#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
int main()
{
        int count;
        struct DIR *d;
        if( (d = opendir(".")) != NULL)
        {
                for(count = 0;  readdir(d) != NULL; count++);
                closedir(d);
        }
        printf("\n %d", count);
        return 0;
}

OTHER TIPS

You can use inotify to track and record file create and unlink events in the monitored directory. It would distribute the total time required to maintain file count and allow you to retrieve the current file count instantaneously.

The inode for the directory does not store the number of files in it, since usually the file count is not needed separately from the list of names in the directory. The directory inode's link count does indirectly give the number of sub-directories (st_nlink is number of sub-dirs plus two).

I think you have no choice except read through the whole list of files in the directory. find might or might not be faster than ls.

This is an example of why large directories are a problem, even when the directory is implemented using a B-tree.

There's no portable way to do this. The low-level file primitives, i.e. readdir, work as if it's a linear list. Clearly, that's an abstraction, and some filesystems might store a count. However, accessing it is inherently filesystem-specific.

If you are willing to jump through hoops you may have each directory in a different filesystem, use quotas, and get the info with the "repquota" command.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top