Dealing with potential failures when appending data to a file on disk

https://softwareengineering.stackexchange.com/questions/347173

09-01-2021
|

Pregunta

I'm designing an application that will be appending blobs to a file on disk (local filesystem) and I'm currently thinking of how to deal with consistency issues that could occur if:

The application suddenly crashes
The whole system stops, e.g. due to a power outage

The goal is that, when the file is later-on read, the application processing the blobs should be able to distinguish if a blob has been corrupted (and thus avoid processing it).

My current idea is to write the following on disk for each blob and flush after each one:

[Size of blob] (4 bytes) [CRC-32 hash of blob] (4 bytes, more to detect issues as files are aging over time) [actual blob bytes]

Here come the questions:

Does this guarantee that, should any of the above conditions occur, the file will contain either only valid data, or n valid blobs + some extra bytes where interpreting the first four as the size will easily indicate that there are not enough remaining bytes in the file for a proper blob (or the extra bytes are under 4, not enough to hold the proper size)?
Could a power-loss corrupt bytes that have been previously written to disk inside the file?
Could a power-loss corrupt the file such that it would appear much bigger than it should be (and thus contain various trash at the end)?
Can various filesystems lead to strange behaviors in this regard? The application will be cross-platform and I'm trying to avoid writing platform-specific code for this.

Some other considerations:

Blobs will be relatively small (around a few kB, < 100 kB)
Loosing a few blobs should a sudden stop occur is acceptable
When the application is restarted, it will create a new, empty file, not append to an already existing one
Only one thread of one process will be doing the appending
Reading the file before it is closed will not be allowed
Should a power-outage occur, a consistency check will be performed on the filesystem after rebooting.

Solución

Unfortunately, your simple scheme does not protect you against disk failure or data corruption.

The weakness is in the unprotected size, which you need to read sequentially the file and find the next blob. So in case of a writing failure, on the size you can loose everything from the first bad size to the end of the file. This can happen in two variations:

creation of a new object: you write the data properly (say a blob of m bytes), and then eventually writing several other blobs. Imagine that the OS writes the size to the disk with an undetected corruption. When later you'll read again the file, you'll find out the wrong size n. There is a high probability that the CRC will highlight the inconsistency, but it will do on the n bytes that follow (although the blob was fully correct). Worse, the n bytes will be discarded as bad blob, and your code will from then on try to read the next blob at a wrong place (offset +n instead of offset + m).
file maintenance outside the application: for example, your blob file is copied from one server to a newer more performant one. If during the transfer a blob payload is corrupted, only this blob will be lost. However, if during the copy a size info gets corrupted, you'll loose all the subsequent blobs.

In a similar way, errors could also affect already written data. For example on an SSD, a hardware error could lead a bit to flip, a hardware defect could also affect disk cache memory (e.g. row hammer like effects). Some filesystems (or even hardware) try to rewrite an allocation unit, if it appears to be located on a defect hard drive sector), etc... But these are issues that affect most data structures, not only yours. One way to reduce them is to read the data you've written to cross-check its consistency.

Otros consejos

Hrm....Could you simply originally write your blobs to some temp file, then change the file name when it's ready? That will significantly limit your fails on write and you can easily see which files have not made the change.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange