Can file systems be designed and implemented in an OS-portable way?

https://softwareengineering.stackexchange.com/questions/344807

08-01-2021
|

Question

Given the interfaces that major OSes (Windows, macOS/OS X/Mac OS X, Linux) provide to file systems, can file systems be designed and implemented in a way that is largely independent of OS?

I'm not at all an expert on file systems and I've never implemented one but I am an enthusiastic user of file systems. My naive expectation is that almost all file system logic is independent of operating system. The OS queries a file system about paths in a C-compatible way, the file system answers in a C-compatible way, and no deep interdependencies are necessary. In practice, this seems to not be the case. For example, simply reading from an ext2/3/4 file system when using Windows or macOS takes significant effort from a user perspective.

Are there technical reasons that file systems seem deeply entwined with operating systems? Do the major OSes interact with file systems in fundamentally incompatible ways? Alternatively, is this mostly due to legal issues? Or maybe the premise of the question is flawed and there just isn't any demand for portable file systems?

(I realize that most OSes incorporate core file system drivers within the OS kernel itself but most also allow loadable OS modules to describe new file systems so the question remains: are there/why are there not OS modules for file systems that are largely OS-independent?)

As a non-expert, I might be wrong on some specifics. For example, maybe the tools/extensions that allow Windows and macOS to read and write ext2/3/4 file systems do share plenty of code with the Linux kernel. Even in that case, a good answer should still explain the apparent difficulties and why other OS + file system combinations are not as easy.

EDIT: To be clear, I'm not asking about incompatibilities at the ABI level. Many particularities can create issues at the ABI level and most are not specific to this question. This question is about whether the APIs provided to file systems by the major OSes are fundamentally incompatible.

Solution

I will give you an example that shows that the answer to your question is both "Yes" and "No": FUSE.

FUSE stands for Filesystem in Userspace. FUSE is a Linux kernel filesystem driver that doesn't actually implement a filesystem; instead, it provides an abstraction layer, an API, and an ABI that exports a protocol for interacting with the FUSE kernel driver from userspace. The second component of FUSE is a userspace library that implements an API wrapper for that protocol.

The end result of all that, is that you can write a filesystem driver in userspace, by linking against the libfuse library and implementing the required callbacks. There are also a set of language bindings for almost every language imaginable, so that you can, for example, write a filesystem driver in Python, Ruby, PHP, ECMAScript, Java, C♯, Scala, and whatever language you can think of.

Now, what does that have to do with this question? Well, the FUSE kernel-userspace protocol and the FUSE library have been implemented on other operating systems as well, and code written against some subset of the FUSE library API is mostly portable across operating systems. So, in that sense, the answer to your question is "Yes, filesystem drivers can be implemented in a portable manner".

But! Here's the catch. Or, actually two catches. First off, FUSE itself is rather complex. For example, the Linux kernel filesystem driver implementing the kernel-userspace protocol is 9000 lines. The macOS kernel filesystem driver is over 17000 lines. And the userspace library which implements the API for userspace filesystem drivers to write against is another 10000 lines. So, that's 36000 lines of code, just to smooth over the differences between Linux and macOS. Remember: those 36000 lines don't actually do anything. They only implement the common API so that FUSE userspace filesystem drivers written against the API can run on macOS and Linux.

And, for example, there are no ports for Windows. There are, however, similar projects for Windows, and some of them provide a compatibility layer for FUSE. But those compatibility layers usually aren't perfect, because the architecture of Windows filesystem drivers is just too different from Unix filesystem drivers (from which both Linux and macOS are descended).

The second catch is that FUSE filesystems tend to be slow. Now, part of that is because they are implemented in userspace and crossing the kernel-userspace boundary is expensive. But part of that is also because FUSE abstracts away a lot of the nitty-gritty OS-specific stuff like the block layer, the I/O scheduler, the filesystem cache, the various zerocopy implementations, and so on. And because FUSE abstracts them away, FUSE userspace filesystems can't take advantage of them.

So, to recap: Yes, FUSE (or rather the various filesystems implemented using FUSE) are existence proof that it is possible to design an API that abstracts away the differences between OSs. But no, filesystem drivers are not "almost all logic independent of the operating system". Looking at FUSE, you need between ~10000 to ~20000 lines of kernel code for each and every operating system, plus an additional ~10000 lines of code for the userspace library to create an abstraction layer that lets you write filesystems in an OS-independent manner. The actual filesystems themselves, are usually much smaller. For example, the FUSE-based sshfs is only ~3000 lines of code. In other words: ~3000 lines of OS-independent code vs. ~10000–20000 lines of OS-dependent code. And another "no" to practicality: you cannot write a high-performance filesystem implementation this way.

ZFS is another interesting case study: typically, most Unix-like OSs have a layered implementation of filesystems with a similar layering. Physical block devices on the bottom, logical block devices (RAID) in the middle, filesystems on top. One filesystem lives on exactly one logical block device, one logical block device lives on one or more physical block devices. ZFS has a completely different layering: it has only two layers, storage pools and filesystems. The boundary is somewhere in the middle of what would be the middle layer in a traditional layering scheme. Filesystems dynamically allocate space from storage pools, one filesystem can live on multiple storage pools, one storage pool can host multiple filesystems. This required extensive changes to the Solaris kernel. Now, if ZFS even requires extensive changes to the OS for which it was designed, then you can probably imagine that you cannot just port it to Linux or Windows without any changes.

I think your main problem is that your view of filesystems is much too simplistic:

The OS queries a file system about paths in a C-compatible way, the file system answers in a C-compatible way, and no deep interdependencies are necessary.

Umm. That's a pretty boring filesystem. Why? Well, you didn't write anything about the filesystem accessing the disk. In the real world, filesystems need to read and write from a disk or a RAID. Or the network in case of a network filesystem. They have to interact with the OS's filesystem cache. They have to interact with the OS's block cache. The have to interact with the OS's I/O scheduler. They have to interact with the OS's virtual memory system (for stuff like mmap). In Linux, it needs to interact and fit into the Linux Virtual Filesystem Switch (VFS). There are users and permissions and roles and attributes and security contexts and domains and access restrictions and ACLs and policies and such. These work completely different between Unix and Windows, for example.

For example, maybe the tools/extensions that allow Windows and macOS to read and write ext2/3/4 file systems do share plenty of code with the Linux kernel.

Not with the Linux kernel, no, they don't. However, the Linux Ext developers actually develop two implementations of Ext in lock-step, the Linux kernel filesystem driver, and libext2fs (part of e2fsprogs), which is a userspace library that implements (mostly read-only) support for ext2/3/4 (and is in turn used by tools such as debugfs, e2fsck, tune2fs and mke2fs). And the userlevel tools available for Windows and macOS could use this library if they wanted to.

OTHER TIPS

Yes, they can, the reason they usually aren't is simply because the people in control of the various OSs often are not interested in making things compatible with other platforms. Linux has drivers for a wide variety of filesystems, including NTFS, and there are ext2/3 drivers for Windows as well.

Microsoft created NTFS because they wanted a comprehensive security capability through access control lists, a function FAT32 did not have. They implemented, plugged it into windows, and were done. They had no reason to concern themselves with the needs of any competing operating system. Eventually, other people made an NTFS driver for other operating systems and it is in use in many systems today.

Your suspicion:

Or maybe the premise of the question is flawed and there just isn't any demand for portable file systems?

Is mostly what's going on.

Ultimately, your disks can be viewed as a massive string of arbitrary bytes, and reading and writing those bytes is fundamentally just like reading and writing the bytes of an individual file. There isn't anything in NTFS that requires Windows, just knowledge of the structure of the data so you can write a driver to provide the common file access methods. Same with any other filesystem.

Secrecy of a filesystem specification can make writing those drivers more difficult, but that isn't a technical limitation.

Your question is the computer equivalent of "why can't I put a GM water pump in my Toyota vehicle?" There's really nothing fundamentally different between GM and Toyota water pumps, it's just that each company doesn't care about being compatible with their competitor's parts. They sometimes actively avoid it, but indifference is usually sufficient.

It is possible, but this is not always fully desirable. POSIX (Portable Operating System Interface) defines the behavior expected from a filesystem and it's API to be OS-portable.

The problem is that when designing a file system you usually have a set of workloads you want to optimize, and sometimes the POSIX semantics makes this optimization challenging.

A distributed file system case: there are multiple users accessing the same file over the network and you want them to cache open files in their computer while the file is open. In a POSIX compliant file system, any changes in the file made by some user should be visible for all users in the upcoming readings. Multiple users accessing the same file at the same time is very unusual in most workloads, so maybe you want to relax the POSIX semantics to allow this optimization and/or simplify the system.

When creating a file system, several trade-offs will appear attempting to make you change the interface or semantic of some operation in particular cases to make the expected workload run faster or even to simplify your architecture.

There actually are filesystems supported by (almost) any OS. UDF may be mentioned in this list. Another example is FAT systems supported on many platforms even in cameras.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange