Question

We're creating an ASP.Net MVC site that will need to store 1 million+ pictures, all around 2k-5k in size. From previous ressearch, it looks like a file server is probably better than a db (feel free to comment otherwise).

Is there anything special to consider when storing this many files? Are there any issues with Windows being able to find the photo quickly if there are so many files in one folder? Does a segmented directory structure need to be created, for example dividing them up by filename? It would be nice if the solution would scale to at least 10 million pictures for potential future expansion needs.

Was it helpful?

Solution

4Kb is the default cluster size for NTFS. You might tune this settings depending on usual picture size. http://support.microsoft.com/kb/314878

I would build a tree with subdirectories to be able to move from one FS to another : How many files can I put in a directory? and avoid some issues : http://www.frank4dd.com/howto/various/maxfiles-per-dir.htm

You can also have archives containing associated pictures to load them with only one file open. Thoses archives might be compressed is the bottleneck is I/O, uncompressed if it's CPU.

A DB is easier to maintain but slower... so it's up to you!

OTHER TIPS

See also this Server Fault question for some discussion about directory structures.

The problem is not that the filesystem is not able to store so many files in a directory but that if you want to access that directory using windows explorer it will take forever, so if you will ever need to access manually to that folder you should segment it, for example with a directory per each 2-3 first letters/numbers of the name or even a deeper structure.

If you could divide that in 1k folders with 1k files each will be more than enough and the code to do that is quite simple.

Assuming NTFS, there is a limit of 4 billion files per volume (2^32 - 1). That's the total limit for all the folders on the volume (including operating system files etc.)

Large numbers of files in a single folder should not be a problem; NTFS uses a B+ tree for fast retrieval. Microsoft recommends that you disable short-file name generation (the feature that allows you to retrieve mypictureofyou.html as mypic~1.htm).

I don't know if there's any performance advantage to segmenting them into multiple directories; my guess is that there would not be an advantage, because NTFS was designed for performance with large directories.

If you do decide to segment them into multiple directories, use a hash function on the file name to get the directory name (rather than the directory name being the first letter of the file name for instance) so that each subdirectory has roughly the same number of files.

I wouldn't rule out using a content delivery network. They are designed for this problem. I've had a lot of success with Amazon S3. Since you are using a Microsoft based solution, perhaps Azure might be a good fit.

Is there some sort of requirement that prevents you from using a third-party solution?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top