Domanda

It seems to me that a org.apache.hadoop.io.serializer.Serialization could be written to serialize the java types directly in the same format that the wrapper classes serialize the type into. That way the Mappers and Reducers don't have to deal with the wrapper classes.

È stato utile?

Soluzione

There is nothing stopping you changing the serialization to use a different mechanism such as java Serializable interface or something like thrift, protocol buffers etc.

In fact, Hadoop comes with an (experimental) Serialization implementation for Java Serializable objects - just configure the serialization factory to use it. The default serialization mechanism is WritableSerialization, but this can be changed by setting the following configuration property:

io.serializations=org.apache.hadoop.io.serializer.JavaSerialization

Bear in mind however that anything that expects a Writable (Input/Output formats, partitioners, comparators) etc will need to be replaced by versions that can be passed a Serializable instance rather than a Writable instance.

Some more links for the curious reader:

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top