Question

in my program, i'm using file.open(path_to_file); .

in the server side, i have a directory that contains plenty of files, and i'm afraid that the program will take longer time to run if the directory is more and more bigger because of the file.open();

    //code:
    ofstream file;
    file.open("/mnt/srv/links/154");//154 is the link id and in directory /mnt/srv/links i have plenty of files
    //write to file
    file.close();

Question: can the time to excecute file.open() vary according to the number of files in the directory?

I'm using debian, and I believe my filesystem is ext3.

Was it helpful?

Solution 2

Yes, it can. That depends entirely on the filesystem, not on the language. The times for opening/reading/writing/closing files are all dominated by the times of the corresponding syscalls. C++ should add relatively little overhead, even though you can get surprises from your C++ implementation.

OTHER TIPS

I'm going to try to answer this - however, it is rather difficult, as it would depend on, for example:

  1. What filesystem is used - in some filesystems, a directory consists of an unsorted list of files, in which case the time to find a particular file is O(n) - so with 900000 files, it would be a long list to search. On the other hand, some others use a hash algorithm or a sorted list, allow O(1) and O(log2(n)) respectively - of course, each part of a directory has to be found individually. With a number of 900k, O(n) is 900000 times slower than O(1), and O(log2(n)) for 900k is just under 20, so 18000 times "faster". However, with 900k files, even a binary search may take some doing, because if we have a size of each directory entry of 100 bytes [1], we're talking about 85MB of directory data. So it will be several sectors to read in, even if we only touch at 19 or 20 different places.

  2. The location of the file itself - a file located on my own hard-disk will be much quicker to get to than a file on my Austin,TX colleague's file-server, when I'm in England.

  3. The load of any file-server and comms links involved - naturally, if I'm the only one using a decent setup of a NFS or SAMBA server, it's going to be much quicker than using a file-server that is serving a cluster of 2000 machines that are all busy requesting files.

  4. The amount of memory and overall memory usage on the system with the file, and/or the amount of memory available in the local machine. Most modern OS's will have a file-cache locally, and if you are using a server, also a file-cache on the server. More memory -> more space to cache things -> quicker access. Particularly, it may well cache the directory structure and content.

  5. The overall performance of your local machine. Although nearly all of the above factors are important, the simple effort of searching files may well be enough to make some difference with a huge number of files - especially if the search is linear.

[1] A directory entry will have, at least:

  • A date/time for access, creation and update. With 64-bit timestamps, that's 24 bytes.
  • Filesize - at least 64-bits, so 8 bytes
  • Some sort of reference to where the file is - another 8 bytes at least.
  • A filename - variable length, but one can assume an average of 20 bytes.
  • Access control bits, at least 6 bytes.

That comes to 66 bytes. But I feel that 100 bytes is probably more typical.

There are a lot of variables which might affect the answer to this, but the general answer is that the number of files will influence the time taken to open a file.

The biggest variable is the filesystem used. Modern filesystems use directory index structures such as B-Trees, to allow searching for known files to be a relatively fast operation. On the other hand, listing all the files in the directory or searching for subsets using wildcards can take much longer.

Other factors include:

  • Whether symlinks need to be traversed to identify the file
  • Whether the file is local or mounter over a network
  • Cacheing

In my experience, using a modern filesystem, an individual file can be located in directories containing 100's of thousands of files in times less than a second.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top