Question

As I understand, Kryo creates a className<->numberID map in every writeObject. This map is too narrow. Because in your object model, instances tend to belong to the same classes, next writeObject will build and serialize a similar map again (and again, and again, and again). I know that the map can be shared by registering the classes manually but this is tedious manual hardcoding. I would like the map would be started by the first object write, as it does normally, but all subsequent writes in the session would reuse and extend it. This way, registration would occur automatically at runtime, with no additional runtime overhead and more frequently used objects would naturally receive the low id numbers. The map can be stored afterwards, separately, in attachment, as a decryption key. Deserializer will start by loading this map. How do you like the idea, and how can it be implemented?

My question is similar to this Strategy for registering classes with kryo where user could combine all writes under single writeObject using a List. It would be even simpler than storing a map separately, as I propose. But, it seems he does not wish to do so. In my case, such combination is not even possible because of large java model I avoid keeping it wholly in memory by serializing in pieces. In my scenario, user opens a project, makes changes and flushes them. So, the project could maintain a map of classes and use it for all serializations.

Update! I have realized that there are class/object registrators and autoReset. They seem to be created right for the task. However, I do not see how these things solve it. Autoreset=false does make the second write much smaller indeed. However, I fail to deserialize objects in this case. As you see in example, second deserialization fails:

public class A {
    String f1;
    A(String a) {
        f1 = a;
    }
    List list = new ArrayList();
    public String toString() {
        return "f1 = " + f1 + ":" + f1.getClass().getSimpleName();
    }

    public static void main(String[] args) {
        test(true);
        test(false);
    }


    static void write(String time, Kryo kryo, ByteArrayOutputStream baos, Object o) {
        Output output = new Output(baos); 
        kryo.writeClassAndObject(output, o); 
        output.close();
        System.err.println(baos.size() + " after " + time + " write");
    }

    private static void test(boolean autoReset) {
        Kryo kryo = new Kryo();
        kryo.setAutoReset(autoReset);
        kryo.setInstantiatorStrategy(new StdInstantiatorStrategy());
        System.err.println("-------\ntesting autoreset = " + autoReset);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        A a = new A("a"), b = new A("b");
        write("first", kryo, baos, a);
        write("second", kryo, baos, b);
        A o1 = restore("first", baos, kryo);
        A o2 = restore("second", baos, kryo); // this fails
        System.err.println((o1.f1.equals(o2.f1)) ? "SUCCESS" : "FAILURE");

    }

    private static A restore(String time, ByteArrayOutputStream baos, Kryo k) {
        ByteArrayInputStream in = new ByteArrayInputStream(baos.toByteArray());
        Input input = new Input(in);
        A o = (A) k.readClassAndObject(input);
        System.err.println("reading object " + time + " time, got " + o);
        return o;
    }

Output is

-------
testing autoreset = true
41 after first write
82 after second write
reading object first time, got f1 = a:String
reading object second time, got f1 = a:String
SUCCESS
-------
testing autoreset = false
41 after first write
52 after second write
reading object first time, got f1 = a:String
reading object second time, got null
Exception in thread "main" java.lang.NullPointerException
    at kryo_test.AutoresetDemo.test(AutoresetDemo.java:40)
    at kryo_test.AutoresetDemo.main(AutoresetDemo.java:18)

Update2 It may also happen that autoReset=false records object references in addition to class names. It is worth to autoreset indeed.

Update3 I have discovered that it is difficult to serialize the map of classes (i.e., class -> registration) because registrations contain serializers that refer a kryo object and keep some state. It is difficult to share the map between many kryo objects then.

Was it helpful?

Solution

Ok, here is a solution for kryo-2.20

public class GlobalClassKryo extends Kryo {

    public static class ExternalizableClassResolver implements ClassResolver {

        //local serializers
        final Map<Class, Registration> fromClass = new HashMap();
        final Map<Integer, Registration> fromId = new HashMap();

        public static class GlobalRegistration { int id; Class type; Class<? extends Serializer> serializer; }

        public final Map<Integer, GlobalRegistration> globalIds; 
        public final Map<Class, GlobalRegistration> globalClasses; 

        // I synchronize because I have one reader and one writer thread and 
        // writer may break the reader when adds something into the map. 
        public ExternalizableClassResolver() {this (
                Collections.synchronizedMap(new HashMap()), 
                Collections.synchronizedMap(new HashMap())
            ) ;}

        public ExternalizableClassResolver(Map<Integer, GlobalRegistration> ids, Map<Class, GlobalRegistration> classes) {
            globalIds = ids;
            globalClasses = classes;
        }

        public ExternalizableClassResolver (DataInput in) throws ClassNotFoundException, IOException {
            this();
            int id;
            while ((id = in.readInt()) != -1) {
                GlobalRegistration e = new GlobalRegistration();
                globalIds.put(e.id = id, e);
                e.type = Class.forName(in.readUTF());
                e.serializer = (Class<? extends Serializer>) Class.forName(in.readUTF());
                globalClasses.put(e.type, e);
            }
        }

        public void save(DataOutput out) throws IOException {
            for (GlobalRegistration entry : globalIds.values()) {
                    out.writeInt(entry.id);
                    out.writeUTF(entry.type.getName());
                    out.writeUTF(entry.serializer.getName());
            }
            out.writeInt(-1);
        }

        static final boolean TRACE = false;
        void log(String msg) {
            System.err.println(kryo != null ? Utils.fill(kryo.getDepth(), ' ')  + msg : msg);
        }
        @Override
        public Registration writeClass(Output output, Class type) {
            if (type == null) {output.writeInt(0, true); return null;}
            Registration registration = kryo.getRegistration(type);
            output.writeInt(registration.getId(), true);
            return registration;
        }
        @Override
        public Registration readClass(Input input) {
            int classID = input.readInt(true);
            if (classID == 0) return null;
            Registration registration = fromId.get(classID);
            if (registration == null) {
                registration = tryGetFromGlobal(globalIds.get(classID), classID + "");
            }
            if (registration == null) throw new KryoException("Encountered unregistered class ID: " + classID);
            return registration;
        }

        public Registration register(Registration registration) {
            throw new KryoException("register(registration) is not allowed. Use register(type, serializer)");
        }

        public Registration getRegistration(int classID) {
            throw new KryoException("getRegistration(id) is not implemented");
        }

        Registration tryGetFromGlobal(GlobalRegistration globalClass, String title) {
            if (globalClass != null) {
                Serializer serializer = kryo.newSerializer(globalClass.serializer, globalClass.type);
                Registration registration = register(globalClass.type, serializer, globalClass.id, "local");
                if (TRACE) log("getRegistration(" + title + ") taken from global => " + registration);
                return registration;
            } else
                if (TRACE) log("getRegistration(" + title + ") was not found");
            return null;
        }
        public Registration getRegistration(Class type) {
            Registration registration = fromClass.get(type);
            if (registration == null) {
                registration = tryGetFromGlobal(globalClasses.get(type), type.getSimpleName());
            } else
                if (TRACE) log("getRegistration(" + type.getSimpleName() + ") => " + registration);

            return registration;
        }

        Registration register(Class type, Serializer serializer, int id, String title) {
            Registration registration = new Registration(type, serializer, id);
            fromClass.put(type, registration);
            fromId.put(id, registration);

            if (TRACE) log("new " + title + " registration, " + registration);

            //why dont' we put into fromId?
            if (registration.getType().isPrimitive()) fromClass.put(getWrapperClass(registration.getType()), registration);
            return registration;
        }

        int primitiveCounter = 1; // 0 is reserved for NULL
        static final int PRIMITIVE_MAX = 20;

        //here we register anything that is missing in the global map. 
        // It must not be the case that something available is registered for the second time, particularly because we do not check this here
        // and use registered map size as identity counter. Normally, check is done prior to callig this method, in getRegistration
        public Registration register(Class type, Serializer serializer) {

            if (type.isPrimitive() || type.equals(String.class))
                return register(type, serializer, primitiveCounter++, "primitive");

            GlobalRegistration global = globalClasses.get(type);

            if (global != null )
                    throw new RuntimeException("register(type,serializer): we have " + type + " in the global map, this method must not be called");

            global = new GlobalRegistration();
            globalIds.put(global.id = globalClasses.size() + PRIMITIVE_MAX, global); 
            globalClasses.put(global.type = type, global);
            global.serializer= serializer.getClass();

            return register(global.type, serializer, global.id, "global");
        }

        public Registration registerImplicit(Class type) {
            throw new RuntimeException("registerImplicit is not needed since we register missing automanically in getRegistration");
        }

        @Override
        public void reset() {
            // super.reset(); //no need to reset the classes
        }

        Kryo kryo;
        public void setKryo(Kryo kryo) {
            this.kryo = kryo;
        }
    }




    public ExternalizableClassResolver ourClassResolver() {
        return (ExternalizableClassResolver) classResolver;
    }

    public GlobalClassKryo(ClassResolver resolver) {
        super(resolver, new MapReferenceResolver()); 
        setInstantiatorStrategy(new StdInstantiatorStrategy());
        this.setRegistrationRequired(true);
    }
    public GlobalClassKryo() {
        this(new ExternalizableClassResolver());
    }

    @Override
    public Registration getRegistration (Class type) {
        if (type == null) throw new IllegalArgumentException("type cannot be null.");

        if (type == memoizedClass) return memoizedClassValue;
        Registration registration = classResolver.getRegistration(type);
        if (registration == null) {
            if (Proxy.isProxyClass(type)) {
                // If a Proxy class, treat it like an InvocationHandler because the concrete class for a proxy is generated.
                registration = getRegistration(InvocationHandler.class);
            } else if (!type.isEnum() && Enum.class.isAssignableFrom(type)) {
                // This handles an enum value that is an inner class. Eg: enum A {b{}};
                registration = getRegistration(type.getEnclosingClass());
            } else if (EnumSet.class.isAssignableFrom(type)) {
                registration = classResolver.getRegistration(EnumSet.class);
            }
            if (registration == null) {
                //registration = classResolver.registerImplicit(type);
                return register(type, getDefaultSerializer(type));
            }
        }
        memoizedClass = type;
        memoizedClassValue = registration;
        return registration;
    }

    public Registration register(Class type, Serializer serializer) {
        return ourClassResolver().register(type, serializer);}

    public Registration register(Registration registration) {
        throw new RuntimeException("only register(Class, Serializer) is allowed");}

    public Registration register(Class type) {
        throw new RuntimeException("only register(Class, Serializer) is allowed");}

    public Registration register(Class type, int id) {
        throw new RuntimeException("only register(Class, Serializer) is allowed");}

    public Registration register(Class type, Serializer serializer, int id) {
        throw new RuntimeException("only register(Class, Serializer) is allowed");
    }

    static void write(String title, Kryo k, ByteArrayOutputStream baos, Object obj) {
        Output output = new Output(baos);
        k.writeClassAndObject(output, obj);
        output.close();
        System.err.println(baos.size() + " bytes after " + title + " write");
    }
    static class A {
        String field = "abcABC";
        A a = this;
        //int b = 1; // adds 1 byte to serialization
        @Override
        public String toString() {
            return field 
                    + " " + list.size() 
                    //+ ", " + b
                    ;
        }

        // list adds 3 bytes to serialization, two 3-byte string items add additionally 10 bytes in total
        ArrayList list = new ArrayList(100); // capacity is trimmed in serialization
        {
            list.add("LLL");
            list.add("TTT");

        }
    }

    private static void test() throws IOException, ClassNotFoundException {
        GlobalClassKryo k = new GlobalClassKryo();

        ByteArrayOutputStream baos = new ByteArrayOutputStream();

        write("first", k, baos, new A()); // write takes 24 byts

        //externalize the map
        ByteArrayOutputStream mapOut = new ByteArrayOutputStream();
        DataOutputStream dataOut = new DataOutputStream(mapOut);
        k.ourClassResolver().save(dataOut);
        dataOut.close();

        //deserizalize the map
        DataInputStream serialized = new DataInputStream(new ByteArrayInputStream(mapOut.toByteArray()));
        ExternalizableClassResolver resolver2 = new ExternalizableClassResolver(serialized);

        //use the map
        k = new GlobalClassKryo(resolver2);
        write("second", k, baos, new A()); // 24 bytes

        Input input = new Input(new ByteArrayInputStream(baos.toByteArray()));
        Object read = k.readClassAndObject(input);
        System.err.println("output " + read);
    }

    public static void main(String[] args) throws IOException, ClassNotFoundException {
        Kryo k = new Kryo();
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        write("first", k, baos, new A()); // write takes 78 bytes
        write("second", k, baos, new A()); // +78 bytes
        System.err.println("----------------");

        test();
    }
}

Resulting stream is clean of class names. Unfortunately, Kryo turns out to be too slow compared to default java serialization (2x or more), though it's resulting stream is much more dense. Kryo alone makes my sample serializations almost 10x smaller. You see that the solution presented in this answer adds additional factor of 3x. But, in the fields, where I serialize megabytes, I am getting only 2x compression with respect to java serialization and 2x slowdown when storing to disk with Kryo.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top