Question

We're getting StackOverflowErrors from Java's serialization library. The problem is that the default serialization implementation is recursive, the depth of which is bounded only by the longest path through a network of references.

We realize we could override the default methods but we have hundreds of richly-connected classes in our project so we're not enthusiastic about the override approach. We're more interested if there is a generalized solution that is non-recursive (or at least moves the recursion from the stack to the heap).

I googled this topic and found only lots of people bitterly complaining about the same thing but most of these complaints were from many years ago. Has the situation improved? If not and we write a generalized implementation, do you have any advice? We're presuming there is some reason (not yet obvious to us) why no one has cracked this nut. In theory, doing it 'right' sounds like it should be feasible.

Was it helpful?

Solution

I had this problem a while back. For richly connected classes, even if you are able to complete the serialization without a stack overflow, the serialization is dog slow. When we solved this problem, we had a handful of classes, so we just created our own serialization format that packed the data into a set of integer object ids, with integer fields ids for each field and described their connections through a series of object id, field id, other object id mappings. This custom approach was very fast and extremely light on memory, but really only works if you have a small set of classes you want to serialize.

The general case is much harder and the demand for serialization of richly connected classes is not that strong, so I would guess that's why no ones solved it.

You're basically onto the problem already though, you will always need a stack depth equal to the maximum height of a Depth First Search tree and so any time your graph is deeper than that, you'll get a stack overflow. It is fundamentally a recursive problem, so you're going to either need to use recursion or fake recursion by moving the stack allocations to a Stack object you put on the heap. I would take a look at the OpenJDK implementation:

http://hg.openjdk.java.net/jdk6/jdk6-gate/jdk/file/tip/src/share/classes/java/io/ObjectOutputStream.java

You already have a DebugTraceInfoStack, I would create a second Stack field for the current object you are writing and change the writeObject0 method to push the Object onto the stack, something like this:

stack.push(obj);
while(!stack.empty()) {
    obj = stack.pop();
    ...

Then you just change all calls to writeObject0(x); to stack.push(x);. Simple, standard conversion between recursion and iteration, except the class is almost 2500 lines and there are probably tons of gotchas.

If you do end up building it though, I would advocate submitting at as a patch to the next version of java as it would be useful, something like IterativeObjectOutputStream for use on deep object graphs.

OTHER TIPS

Proof that JDK 6 serialization can handle recursive object graphs:

public static void main(String[] args) throws Exception {
    Foo foo = new Foo("bob");
    foo.setBar(new Bar("fred", foo));
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ObjectOutputStream out = new ObjectOutputStream(baos);
    out.writeObject(foo);
    out.close();
    ObjectInputStream in = new ObjectInputStream(new ByteArrayInputStream(baos.toByteArray()));
    Object o = in.readObject();
    System.out.println(o);
}

static class Foo implements Serializable {
    String name;
    Bar bar;

    Foo(String name) {
        this.name = name;
    }

    void setBar(Bar bar) {
        this.bar = bar;
    }

    @Override
    public String toString() {
        return "Foo{" +
                "name='" + name + '\'' +
                ", bar=" + bar +
                '}';
    }
}

static class Bar implements Serializable {
    String name;
    Foo foo;

    Bar(String name, Foo foo) {
        this.name = name;
        this.foo = foo;
    }

    @Override
    public String toString() {
        return "Bar{" +
                "name='" + name + '\'' +
                '}';
    }
}

Output:

Foo{name='bob', bar=Bar{name='fred'}}

It appears I did not read the question very well. You seem to be interested in serializing properties that may contain circular references. If this assumption is incorrect, and you would be fine with NOT serializing these objects which contain circular references, please refer to my original answer below.

New Answer

I think you would need to keep track of which objects have been serialized, and I cannot see this happening, unless you do it yourself. It shouldn't be too difficult though.

On these objects that contain circular references you can keep a transient boolean representing whether or not the object has been serialized already. Then, you must override the default serializing behavior, but this can be done with just a few lines.

public void writeExternal(ObjectOutput out) {
    if(!out.serialized) {
        out.serializeMethod();
    }
    out.serialized = true;
}

Original Answer

Take a look at the transient keyword

I would imagine that most serialization libraries would honor the transient keyword. If a member is transient it is meant to be excluded from serialization.

class Something {
    private Dog dog; // I will be serialized upon serialization.
    private transient SomethingElse somethingElse; // I will not be serialized upon serialization.
}

class SomethingElse {
    private Cat cat; // I will be serialized upon serialization.
    private transient Something something; // I will not be serialized upon serialization.
}

If you have recursive members similar to the above scenario, you would want to mark one or the other (or both) as transient so that this overflow does not happen.

GWT RPC serialization is basically equivalent to JVM serialization, and both use a stack/recursive technique. Unfortunately this doesn't work well with slicing work into chunks (which you need to do if you're working in the browser, i.e. with GWT), so here's a non-recursive approach: https://github.com/nevella/alcina/blob/d3e37df57709620f7ad54d3d59b997e9c4c7d883/extras/rpc/client/src/com/google/gwt/user/client/rpc/impl/ClientSerializationStreamReader.java

Essentially, convert the serialization into three passes: * instantiate objects * set properties (via linking) * populate collections

Two tricks: some objects need properties on instantiation (e.g. Date) and you need to populate collections last since they may need the hashes of their members.

This allows non-recursive de-serialization - but in fact the non-recursive serialization is even simpler (as long as there's no custom writeReplace/readResolve), just within writeObject maintain two queues of objects-not-serialized, properties-not-serialized-for-current-object and have serialize-object-property push to the stack with a marker rather than make a recursive call.

There's a very basic example of that here: http://www.w3.org/2006/02/Sierra10022006.pdf

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top