Question

Persistent data structures depend on the sharing of structure for efficiency. For an example, see here.

How can I preserve the structure sharing when I serialize the data structures and write them to a file or database? If I just naively traverse the datastructures, I'll store the correct values, but I'll lose the structure sharing. I'd like to be able to save data-structures with shared components to a file, restore them, and still have most of the structure shared in the restored data.

Was it helpful?

Solution

You want some form of hash-consing. This problem has been well studied. Andrew Kennedy's paper on pickler combinators explains in detail how to serialize and unserialize while preserving sharing.

OTHER TIPS

There are two obvious methods I can think of, and they're related.

  1. Don't serialize the structures, serialize the nodes. So, you'd store a serialized record for each of the nodes in the example tree you gave, and you'd convert all node references to a database key name for the node. This gives you sharing automatically, but has the cost of having to do multiple lookups chasing the references in order to load a structure.
  2. Color your nodes by ownership, like in your example. Have a concept of which structure a given node 'belongs' to and only serialize the nodes in a structure that belong to that structure. Links to nodes in other structures get replaced by a reference to that structure and the node in question. This allows you to load an entire structure at once, but can cause you to have to load ALL related structures if they are highly interlinked.

Choosing between these options depends on what you are trying to optimize for, and what sorts of linkage you expect to see in practice.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top