Question

My PHP project will use thousands of pictures and each needs only a single number for it's storage name.

My initial idea was to put all of the pictures in a single directory and name the files "0.jpg", "1.jpg", "2.jpg", and all the way to "4294967295.jpg" .

Would it be better performance-wise to create a directory tree structure and name the files something like "429 / 496 / 7295.jpg"?

If the answer is yes, then the follow up question would be: what is the optimal amount of subdirs or files per level of depth? And what effect does the chosen filesystem have on this?

Each file will have a corresponding MySQL entry with an UNSIGNED LONGINT id-number.

Thank you.

Was it helpful?

Solution

It depends on which filesystem is being used. ext{2,3,4} have a dir_index option that can be set when they are created that make storing thousands or even millions of files in a single directory reasonably fast.

btrfs is not yet production ready, but it implicitly supports this idea at a very basic level.

But if you're using the ext series without dir_index or most other Unix filesystems you will need to go for the more complex scheme of having several levels of directories. I would suggest you avoid that if you can. It just adds a lot of extra complication for something filesystems ought to be handling reasonably for you.

If you do use the more complex scheme, I would suggest encoding the number in hex and having 256 files/directories at each level. Filesystems that aren't designed to handle large numbers of files in each directory typically do linear scans. The goal is to approximate a B-Tree type structure on your own. 2 hex digits at each level gives you about half a 4kiB (a common size) disk block per level with common means of encoding directories. That's about as good as you're going to get without a really complicated scheme like encoding your numbers in base 23 or base 24.

OTHER TIPS

Yes, hard-to-say, quite a bit, perhaps you should use a database

The conventional wisdom is "use a database", but using the filesystem is a reasonable plan for larger objects like images.

Some filesystems have limits on the number of directory entries. Some filesystems do not have any sort of data structure for filename lookups, but just do a linear scan of the directory.

Optimizations like you are discussing are restricted to specific environmental profiles. Do you even know right now what future hardware your application will run on? Might it be a good idea to not stress the filesystem and make a nice, hierarchical directory structure? If you do that it will run well on any filesystem or storage server.

Having several thousands files in one directory will slow things down considerably. I'd say a safe number is up to 1024 files per directory, 512 even better.

The answer, of course, is: It depends.

In particular, it depends on which file system you use. For example, the ext2 and ext3 file systems have a limits to the number of files per directory. Those file systems would not be able to put all of your pictures in one directory!

You might look into something other than a file system. In the company I work for, because we needed to store lots of material, we moved from file-based storage to a database-based storage run on Apache Jackrabbit.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top