Question

I have a program that produces large number of small files (say, 10,000 files). After they are created, another script accesses them and processes one by one.

Questions:

  • does it matter, in terms of performance, how the files are organized (all in one directory or in multiple directories)
  • if so, then what is the optimal number of directories and files per dir?

I run Debian with ext4 file system

Related

Was it helpful?

Solution

10k files inside a single folder is not a problem on Ext4. It should have the dir_index option enabled by default, which indexes directories content using a btree-like structure to prevent performance issues.

To sum up, unless you create millions of files or use ext2/ext3, you shouldn't have to worry about system or FS performance issues.

That being said, shell tools and commands don't like to be called with a lot of files as parameter ( rm * for example) and may return you an error message saying something like 'too many arguments'. Look at this answer for what happens then.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top