Recommended techniques for field updating embedded Linux safely

https://stackoverflow.com/questions/287331

08-07-2019
|

Question

Embedded Linux based devices often require a mechanism to update applications and system files. For example, a (non-networked) lab instrument with a USB port can get software updates from a USB stick.

It would be a simple matter to run a script to copy files into place on the device's internal flash memory. However, there is the danger that the device would lose power in the middle of the update, and end up a brick.

The situation for application files is a bit easier since there is room to duplicate the application directory, update one copy, and quickly swap old and new directories minimizing the failure window.

Things are dicier for kernel and system files since they are spread out throughout the file system.

We have used hard and soft links in the file system to identify critical files. We use hashes on files and archives to verify file integrity. We have considered using an emergency ramfs in the kernel to provide a fallback if starting from the updated file system fails.

What are your approaches to this requirement?

Solution

I would go with the same approach as with the application files: Make for the critical files and complete own partition, link to them, and duplicate the partition. In all of your init you should as first check if the links show all to the same partition, if not, reset them (to the partition with the files with the newest date of a certain file). If you want to update just copy everything onto the new partition, and if everything is fine (crcs ok) loop over the files and set for each the link from one filesystem to the other.

This way your critical files should be always be in sane state.

Scenarios:

Update fails while copying files onto new partition

No Problem because the links show still to the old working ones.
Update fails while linking

No Problem because all new files are valid and already copied (else the relink step wouldnt have start), setup check correct this

OTHER TIPS

If you must ensure the reliability, you can have two flash partitions (or even chips), one with the current working configuration and one with the new configuration. Then use a hardware watchdog which will reset the unit and switches the active boot flash partition to the "last known good" configuration.

Have at least two partitions. I'd suggest 4

boot
alternate boot
program data backup
program volatile data

Use grub fallback booting to boot alternate if boot fails.

So if the update fails, the alternate works.

NEVER update the boot loader.

If the data partition is toasted, reformat and copy over the backup data partition.

Now you can't fail unless the flash disk dies. If you are using COTS hardware, and main disk was say, Compact flash, you could have a physically isolated backup on say, a little USB key.

IMHO any update which is not atomic can break the system or make the check for consistency quite difficult. I agree that updating the boot loader must be avoided because it is not power off safe. Generally, a manufacturer want an update from Firmware x.x.x to version y.y.y, without bothering if kernel and/or a single file was updated. Updating single files can become a nightmare for the service, because it is very difficult to understand what is running on customer's hardware. Maybe you are mixing a dual copy approach (application is redundant) with a single-copy approach. I think that this does not help a lot, because the integrity of the system is done by the weak component in the chain. If an update of the root filesystem fails, it is not important that the application is duplicated.

A dual copy approach can guarantee an update without out of service, if you need this. But it requires a lot of resources, because all components must be duplicated. Personally, I use a fallback approach, where a small rootfs in RAM is started if the main application fails or if the last update was not successful. This fallback system, started automatically by the bootloader if anything goes wrong, update the system from a USB pen (if a local update is required).

I have never found a OSS project about these issues and I started recently a new one, based on my previous experience. I have several products running it and my customer are happy with it.

Maybe you can take a look at it. You can find sources for "swupdate" (the name of the project) at github.com/sbabic/swupdate.

Stefano

I think what you are trying to achieve here is atomicity of the update process. Atomicity is critical for embedded devices, one of the reasons highlighted is power loss; but there could be others like hardware/network issues. A definition I use for atomicity in the context of updates is:

An update is always either completed fully, or not at all
No software component besides the updater ever sees a half installed update

For Embedded Linux there are several software components that you might want to update and different designs to choose from; there is a paper on this here: https://mender.io/user/pages/04.resources/_white-papers/Software%20Updates.pdf

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow