Filesystem superblocks and their backup copies

https://softwareengineering.stackexchange.com/questions/260769

05-10-2020
|

Question

I'd like to understand how (modern) filesystems are implemented and having trouble to fully understand superblocks and their backups. I reference ext4 and btrfs, but the questions may also apply to other filesystems.

Ext4 stores a couple of superblocks (my fs for example has one primary and seven backup SBs). I do understand that, since the superblock defines important characteristics of the filesystem, an inline backup makes sense. But I don't get why so many backups, so:

Why store so many superblocks? What's the benefit of having 7 backup SBs versus e.g just one?

According to ext4 documentation ext4 stores a "write time" within the SB (last write since epoch). That would imply that every write-transaction also consists of a write to the SB. Given my system, having 7 backup SBs, each write-transaction would consist of 8 SB writes. That seems a ridiculous amount of non-sequential metadata writes for a single transaction, leading to the question:

Am I correct that on ext4 SBs are written that often?

The same questions basically apply to btrfs where SBs are distributed among static address blocks (primary block at 0x10000) and only in SSD mode (due to wear leveling concerns) only one is written per commit.

Is there a benefit for btrfs to store the primary superblock at 0x10000 instead of 0x0?

The documentation of btrfs also states: Note that btrfs only recognizes disks with a valid 0x10000 superblock; otherwise, there would be confusion with other filesystems. This is even more confusing since a broken SB at 0x10000 would lead to an invalid filesystem, even if there're other valid SBs at other locations.

How does btrfs benefit from superblock backups, if the filesystem is invalid on a broken primary superblock?

Solution

This is diving a little deep into the specific underlying structures of filesystems, which are very complicated and idiosyncratic. But here goes...

First, superblocks are not one thing. They differ, filesystem to filesystem, in what information they contain. The two filesystems you mention, extN and btrfs, are much evolved and not all that similar to grandpa Unixes. And most of the grandpa Unixes like AIX, HP-UX, IRIX, and Solaris are at least some distance removed from the BSD "Fast File System" and original Nth Edition Unix file systems.

But in all of the implementations I've seen, the different variations on superblocks may sit at the top of the filesystem metadata hierarchy, but they are not the primary metadata blocks for the file system, nor the data structure that is most often updated. They are not updated every write, for example, nor even every metadata update or directory modification. The reason they are so multiply-replcated / well-protected is that, sitting at the top of the file system structure, they define the filesystem parameters (some of which are semantically critical to proper operation), and knit together all of the the other metadata structures. Without them, even finding all of the other metadata structures would be difficult, much less getting a perfect, as-originally-intended interoperation of them.

Over the years, the ability of filesystem check-and-repair tools (fsck is the great grandpappy of them all) to repair from catastrophic failures (like "superblock loss") has improved. But it's ugly, slow, and heuristic (i.e. potentially imperfect = possible data loss) on almost all of them. And the things all Unix implementations have striven for over the years include higher reliability, more robust storage, and faster boot times). Having a plodding check-and-repair process is an impediment to all. Thus the move to N-way replication.

Because superblocks (or more generally, all of the blocks/data structures defining the filesystem as a whole) are not the most active ("hottest") metadata in the system (only the highest level), and because their loss is so damaging, designers conclude we can afford 2- or more-way replication. They tend to have their data flushed to disk only periodically (e.g. every 30 seconds). Thus their performance/update overhead is minimal.

I haven't used btrfs or tried to recover it from failure, so I have no particular insight into its peculiar restrictions on superblock placement, or what special load-leveling constraints are typically enabled for SSD operation. But like other filesystem designs, its management tool (btrfsck) has a specific option (-u) to manually specify the use of a secondary or tertiary superblock. In extfs, the corresponding option is -b. Every other Unix filesystem has similarly idiosyncratic controls/commands for repair procedures in its corresponding tool.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange