Frage

My question is about file allocation methods on NTFS Fs.

I have two main questions -

  1. When i create a file on NTFS, is it stored contiguously on the physical hard disk?
  2. if not - is there a way to create a file such that when i write to it the data is stored contiguously (on the hard disk) in it? Something like extents in database.
  3. if such a file exists - is there any way to read data from it (using C read system call) in bunch/block. what is the maximum bunch size I can use.

I am trying to make a simple file based DB for small applications and would like to make my db in the file. for performance reason i need to keep my data in contiguous order on the disk and read it in bunches. (I plan to mmap this file in my application).

War es hilfreich?

Lösung 3

OK, so let's answer point by point...

Question 1: When i create a file on NTFS, is it stored contiguously on the physical hard disk?

The question makes no sense. When you create a file, NTFS will allocate the space in the MFT for the metadata it needs to track things. Small files may actually fit inside the MFT record for the file - such resident files are, by definition, contiguous. If a file won't fit inside the MFT, then blocks of space are allocated as necessary and they may or may not be contiguous. Generally speaking, it doesn't know how big your file will be, or how much space to preallocate for it - so NTFS will just allocate space as necessary, although you can give it a hint by calling the SetEndOfFile function. But that provides only a hint and no guarantee that the file data will be stored in a contiguous area of the disk. In fact, it should be trivial to convince yourself that even if the filesystem performs real-time defragmentation, it can never *guarantee that the free space will be available as a single, contiguous block of disk addresses.


Question 2: if not - is there a way to create a file such that when i write to it the data is stored contiguously (on the hard disk) in it? Something like extents in database.

Why do you think this is an important concern? You generally shouldn't care how the filesystem stores your data; you should only care about the fact that it does store the data. You may think that accessing a file that's not stored continuously would be slower, but that may not, necessarily, be the case; advanced caching algorithms and prefetching by the O/S will often eliminate any slowdown completely. If your concern is performance, then do you have actual hard data that shows that fragmentation by the filesystem is an issue? If so, the correct approach is to either use a different filesystem or no filesystem at all.


Question 3: if such a file exists - is there any way to read data from it (using C read system call) in bunch/block. what is the maximum bunch size I can use.

The C system calls (like fread) don't know about NTFS, fragmentation, "bunches" and blocks. All they know is how to read the requested number of bytes from the specified file handle and put the data into a buffer that you supply. You can specify any size that you want, really, although the C library will call O/S and filesystem specific APIs to read data in multiples of the block size, which is implementation defined.

Andere Tipps

According to this superuser answer, you can call SetEndOfFile to provide the system with a file size hint, which will allow NTFS to allocate contiguous storage for the entire file.

Another important point for multi-tasking or multi-user operating systems is that even if a file is stored contiguously, the drive might be called upon by another task to read or write in the middle of your file access. This will cause the drive to seek somewhere else entirely. On a busy system, this could be happening constantly.

The operating system drivers can use algorithms such as scatter-gather or an elevator algorithm that attempts to schedule reads or writes to or from the various tasks' buffers in the order the data appear on the disk, so the head can sweep sequentially from inner to outer tracks -- or vice-versa, picking up or dropping off data along the way.

Elevator algorithms are so named because real elevators have to choose the most efficient loading and unloading pattern based on requests from passengers on the various floors. They cannot afford to waste time and energy going up and down inefficiently. Disk drive head positioning is not much different.

  1. It might be. But you cannot guarantee it is stored contiguously on the physical hard disk.

  2. You can, with the low-level raw access to the harddisk. For some big database system, they do not use ANY file system, but directly write/read the harddisk. And the data formation in the harddisk is defined by the database system.

  3. No mattter how the file is stored physically, you could read it in block in C. I do not think there is a "maximum bunch size". But there do exist a "good bunch size", as the (block size of file system) * N.

It is said the file system reiserfs is good for storing tons of small files. But I never tested it.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top