Backwards compatibility in .NET with BinaryFormatter

https://stackoverflow.com/questions/3583694

01-10-2019
|

Question

We use BinaryFormatter in a C# game, to save user game progress, game levels, etc. We are running into the problem of backwards compatibility.

The aims:

Level designer creates campaign (levels&rules), we change the code, the campaign should still work fine. This can happen everyday during development before release.
User saves game, we release a game patch, user should still be able to load game
The invisible data-conversion process should work no matter how distant the two versions are. For example an user can skip our first 5 minor updates and get the 6th directly. Still, his saved games should still load fine.

The solution needs to be completely invisible to users and level designers, and minimally burden coders who want to change something (e.g. rename a field because they thought of a better name).

Some object graphs we serialize are rooted in one class, some in others. Forward compatibility is not needed.

Potentially breaking changes (and what happens when we serialize the old version and deserialize into the new):

add field (gets default-initialized)
change field type (failure)
rename field (equivalent to removing it and adding a new one)
change property to field and back (equivalent to a rename)
change autoimplemented property to use backing field (equivalent to a rename)
add superclass (equivalent to adding its fields to the current class)
interpret a field differently (e.g. was in degrees, now in radians)
for types implementing ISerializable we may change our implementation of the ISerializable methods (e.g. start using compression within the ISerializable implementation for some really large type)
Rename a class, rename an enum value

I have read about:

Version Tolerant Serialization
IDeserializationCallback
[OptionalField(VersionAdded)]
[OnDeserializing], [OnDeserialized], [OnSerializing], [OnSerialized].
[NotSerialized]

My current solution:

We make as many changes as possible non-breaking, by using stuff like the OnDeserializing callback.
We schedule breaking changes for once every 2 weeks, so there's less compatibility code to keep around.
Everytime before we make a breaking change, we copy all the [Serializable] classes we use, into a namespace/folder called OldClassVersions.VersionX (where X is the next ordinal number after the last one). We do this even if we aren't going to be making a release soon.
When writing to file, what we serialize is an instance of this class: class SaveFileData { int version; object data; }
When reading from file, we deserialize the SaveFileData and pass it to an iterative "update" routine that does something like this:

for(int i = loadedData.version; i < CurrentVersion; i++)
{
    // Update() takes an instance of OldVersions.VersionX.TheClass
    // and returns an instance of OldVersions.VersionXPlus1.TheClass
    loadedData.data = Update(loadedData.data, i);
}

For convenience, the Update() function, in its implementation, can use a CopyOverlappingPart() function that uses reflection to copy as much data as possible from the old version to the new version. This way, the Update() function can only handle stuff that actually changed.

Some problems with that:

the deserializer deserializes to class Foo rather than to class OldClassVersions.Version5.Foo - because class Foo is what was serialized.
almost impossible to test or debug
requires to keep around old copies of a lot of classes, which is error-prone, fragile and annoying
I don't know what to do when we want to rename a class

This should be a really common problem. How do people usually solve it?

Solution

Tough one. I would dump binary and use XML serialization (easier to manage, tolerant to changes that are not too extreme - like adding / removing fields). In more extreme cases it is easier to write a transform (xslt perhaps) from one version to another and keep the classes clean. If opacity and small disk footprint are a requirement you can try to compress the data before writing to disk.

OTHER TIPS

We got the same problem in our application with storing user profile data (grid column arrangement, filter settings ...).

In our case the problem was the AssemblyVersion.

For this problem i create a SerializationBinder which reads the actual assembly version of the assemblies (all assemblies get a new version number on new deployment) with Assembly.GetExecutingAssembly().GetName().Version.

In the overriden method BindToType the type info is created with the new assembly version.

The deserialization is implemented 'by hand', that means

Deserialize via normal BinaryFormatter
get all fields which have to be deserialized (annotated with own attribute)
fill object with data from the deserialized object

Works with all our data and since three or four releases.

This is a really old question, but it needs an up-to-date answer anyway. Today, in 2019: I would suggest anyone reading this to seriously consider using Protobuf instead of BinaryFormatter. It has most of the advantages of a binary format (which it is) but fewer of its disadvantages.

It works between different languages and technology stacks with ease (Java, .NET, C++, Go, Python)
It has a well-thought-through strategy for handling breaking changes (adding/removing fields, etc) in a way that means it's much easier for "version x" of your software to handle "version y"-generated data and the other way around. Yes, this is actually true: an older version of your app will be able to handle data serialized with a newer version of the Protobuf .proto interface definition. (Non-present fields will simply be ignored when deserializing.)

By comparison, when running a newer versions of the code and deserializing old data, "not-present" fields in the data will be set to their type-specific default value. In that sense, handling old data is not "fully automatic" in that sense, but still a lot simpler than when using the default binary serialization libraries included with platforms like Java and .NET.

If you prefer a non-binary format, JSON is often a suitable choice. For RPC and such scenarios, Protobuf is better though and is even officially being mentioned/endorsed by Microsoft nowadays: Introduction to gRPC on ASP.NET Core. (gRPC is a technology stack built on top of Protobuf)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow