Pergunta

How do we design an on-disk data structure? For ex: look at ext3 inode structure, we have attribute placement. What are the things to be kept in mind especially in terms of memory usage/alignment, padding etc.?

Also, is there any special mechanism to write these data structures to disk in terms of page/block alignment or you just simply write like a file write?

Foi útil?

Solução

Normally, handling of this kind of structures is done in C because of its relative low-level characteristics. For example, when you define an struct you can always know in advance how many bytes it will take and even the alignment of each field. C will never introduce in the struct control or other kind of fields for its own use. This allows to read and write those structs directly from and to disk.

The link included in the question seems to correspond to a Linux EXT2 (Extension 2) filesystem, save 2 things:

  • It says to correspond to EXT3. However, it does not show any of the EXT3-specific structures that were added to EXT2. A good understanding of EXT2 should be enough for your current purposes. I strongly recommend you to read Wikipedia's article on EXT2 or similar.
  • It shows the last entry of the block (pointer) array as a single indirection indirection one, and it doesn't mention double or triple indirection entries. In EXT2 (and EXT3), the block (pointer) array contains 15 entries. The first 12 are direct, number 13 is with single indirection, number 14 is with double indirection, and number 15 triple. This simplification could have been done for the depiction only, or also as a simplification of the whole work you will be doing.

These structures are the origin of the concept of file. You can't read them as files because there are no files yet. You do need to interact with the "block device driver" representing the hard drive. As many other Linux objects, devices (and implicitly their drivers) are represented as elements in the filesystem (no, not in your filesystem that doesn't exist yet, but in the root filesystem that any Linux machine has upon startup). You will need to open the corresponding filesystem object and use the ioctl function to send requests to it.

In order to understand alignment, it's important to differentiate 2 things:

  • The input/output unit (sector), which is the basic amount of bytes you send/receive from the device. Normally, it's 512 bytes.
  • The allocation unit (block), which is the basic amount of bytes you assign to files in the filesystem. Usually 1, 2, 4, 8, 16 sectors or another power of 2 (but the same amount for all blocks in the disk).

Thus, disk are divided in blocks, which are assigned to files to hold their contents. When a file is created with size 0, it doesn't have any block. When the first byte is written, the first block is assigned. When the second byte is written, no new block is assigned because the first block should still have a lot of space available (say 4,095 bytes). When byte number 4,096 is written, it still fits in the first block, but when byte 4,097 is written, a second block is assigned to the file and the new byte is written in its first position. And so on. On average, half of a block is always wasted per each file, specifically in its last block.

The elementary parts of the on-disk data structures should be of size 512 bytes, so they can be read and written with no waste at all. Of course, multiple instances of those elementary parts can be stored contiguously, but elementary information should not span from one sector to the next, but be completely contained in one of them. More than a 512-byte alignment, this is 512-byte size requirement.

However, these sectors will eventually be read to RAM, and for efficiency reasons they should not be subject to any manipulation before use. On the other hand, many CPU architectures operate faster on shorts, which are of size 2 bytes, when their alignment is 2 (i.e. a memory address multiple of 2), on ints, which are 4 bytes, when their alignment is 4, and so on, usually up to size 16. If on-disk data items obey these alignments, and the sectors where they are stored are always read into a memory block with the biggest possible required alignment (16 to be safe), then the data items will be correctly aligned in memory too.

In mode of a summary:

  • Make basic elements of on-disk data structures exactly 1 sector big (even if repetitions of them are going to be stored contiguously).
  • Align data items within those sectors/basic elements according to their size, and
  • Read on-disk data structures (and perhaps all sectors) in 16-byte-aligned memory blocks (or whatever the biggest elementary data item is).

Finally, to answer some questions expressed in different comments:

  • Now that you mention Java, for the first time I understand why it's so important to you this distinction between "on-disk" and "in-memory" data structures. See below.
  • In Java you don't have control on object size, or data item alignment. If you want to comply with the criteria established above, every time you want to write a Java DS to disk, you will need to encode it as a byte[512], and write this. Every time you read a DS from disk, you will need to read a byte[512] and decode it into the corresponding Java DS. This violates the principle of no data transformation between in-memory and in-disk data structures.
  • Also, in Java you usually don't have access to devices. You can emulate block devices through files, though.
  • You can call "on-disk data structures" to the byte[512] data elements that are read/written to disk, and to the source/resulting disk sectors (emulated or physical) themselves.
    • In this vein, you can call "in-memory data structures" to the Java objects to be encoded into byte[512]s and then written into disk, or decoded from byte[512]s that have been read from disk.
  • Correct, as simple byte[512], the "on-disk" data structures will essentially be "pure data".
  • And yes, as Java objects the "in-memory" data structures will belong to classes, which in turn contain functions, constants, data types/auxiliary classes, etc.
  • For more details, please indicate if EXT2 is to be implemented exactly as it is publicly specified, or just as a general model of which you will define details.
  • inode was organized as a table (with horizontal and vertical components) for aesthetic reasons only. Think of it as unidimensional (horizontal or vertical) data only.

Outras dicas

An on-disk data structure is any data structure specially suited to hard disks or other types of block-access storage devices (as opposed to RAM).

Example: A (linked) binary tree is more suited to RAM because the basic storage (and search) unit is small: single nodes with one and only one key of, say, 100 bytes. A b-tree is suited to hard disk because its basic storage and search unit uses several keys (say 20) for a total size of 2000 bytes. This is much closer to the storage and access units of hard disks, meaning that when one node is stored in a disk storage unit little or nothing is wasted, and that when a disk access operation is performed, all the transferred information is really needed and used. In fact, b-trees accept a varying number of keys per node, which can be adjusted to suit specific hard drives.

On a power outage, RAM data structures are immediately lost. It doesn't make any sense to attempt any recovery. However, hard drives are not. The ability of this kind of data structures to be recovered (partially or totally) is usually an important criterion when selecting them.

It could also refer to the data structures required to manage a block-access memory system, usually in form of files and directories (and their attributes, including security ones).

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top