Question

I have a third-party class that I am trying to use in Hadoop, and thus need to make have it implement Writable. The problem is that the way Hadoop uses Writable is to create an object o = SomeObject(), then call o.readFields(in) to de-serialize, and in my situation I cannot create the empty object:

public abstract class Cube {
    protected final int size;
    protected Cube(int size) { this.size = size; }
}

Note size is final.

public class RealCube {
    public Cube(int size) { super(size); }
}

Here RealCube only has one super constructor to call, and that construtor sets the final variable in the abstract super class.

public class RealCubeWritable implements Writable {
    public void readFields(DataInput in) {
        /* yikes! need to set the size */
    }
}

When we get down to trying to implement RealCubeWritable, I cannot have a RealCubeWritable() constructor, and I cannot know the actual size until the DataInput stream is examined.

So it seems like the only way to do this in Hadoop is to use a wrapper. What I am wondering is if there is a way to use a wrapper, but have RealCubeWritable still behave like RealCube? I've looked into using Dynamic Proxy classes, but I'm not sure if this will work (or how to actually do it).

Thanks!

Était-ce utile?

La solution

If you genuinely have no control over the Cube object then i'm not sure you have many (pleasant) options:

  • I'm not sure i understand what you mean by a wrapper or proxy object - either way final is final so you'd need to create a copy of the class without the final flags
  • You might be able to use a nasty reflection hack to allow you to un-final the size field, and then set the field value also through reflection, but that may cause some undefined behaviour if Cube initialised other variables from size in the constructor
  • You could write your own Serialization class, which will allow you to create a new instance of RealCube (not the most efficient, but it will work) for each object (rather than utilizing traditional hadoop object reuse)
  • Is the domain of size relatively small? (i.e. it can only be a limited set / range of values). If so you could create an instance of RealCube for each valid size value, and again, using a custom Serialization implementation, pick the right Cube instance based upon the size read from the input stream
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top