FAT, optimize performance when retrieve a file

https://stackoverflow.com/questions/9757451

24-05-2021
|

Pergunta

I have an implementation of database with one file per record, and I have about 10000 records. I'm trying to optimize the performance of access to file, and I have a little doubt.

Is split files into folders better then keep all in single folder, for quick access to the files? ex: from 0 to 999 in folder 0, from 1000 to 1999 in 2 etc...

What is better for this, FAT16 or FAT32?

Solução

If you are accessing the files directly, then you won't have any performance drop. If you are searching for a particular file on the disk, it would be faster to store them in folders. This way folders would emulate db indexes. But as @blow mentioned, why don't you use something like Sqlite?

Outras dicas

When you retrieve a file by filename you most likely do a linear search in the directory containing that file, you skip all directory entries until you find the one that matches the given filename.

This search operation may be slow if you do it every time for every file, there are many files in the directory and reads are slow (if your CPU is slow you lose even more).

You may want to build some sort of an index, a compact array of pairs filename+location sorted by filename, which you can keep in memory to quickly find files w/o rereading the directory entries.

Things can be greatly simplified if there's a constant number of files and they have the same length or are padded to the same length. In that case you don't need any search as you can calculate the location of each file directly from the filename, provided, of course, that the order of the files is fixed.

The only practical difference between FAT1x and FAT32 in this context is the size of the file allocation table, that set of linked lists/chains that tells you which clusters are free or occupied by file/directory data and tells you which cluster is the next in a file/directory after the given one. In FAT32, the cluster chain elements are 32-bit, 2 times larger than on FAT16. If the number of used clusters is small (less than ~64K), you are going to read twice as much data from FAT32 while traversing the cluster chains compared with FAT16. Also, finding a free cluster on FAT32 (when you create a new file/dir or grow an existing one) can be slow if there are many clusters on the disk (and there can be up to 2^28 on FAT32 AFAIR vs 2^16 of FAT16). You don't want to start searching for a free cluster from the beginning of the FAT every time. You want to keep somewhere a pointer to the last place you stopped the search and the next time search from there and then go to the beginning of the FAT when you've reached the FAT's end.

Split them across directories (the split number depending on your cluster size) and do not use LFN (LongFileName) if you can, because it will slow down your operation. I also work on embbeded systems. I did not have to access 1000s of files like you, but i avoided LFN (especially for royalty reasons).

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow