There is nothing stopping you changing the serialization to use a different mechanism such as java Serializable interface or something like thrift, protocol buffers etc.
In fact, Hadoop comes with an (experimental) Serialization implementation for Java Serializable objects - just configure the serialization factory to use it. The default serialization mechanism is WritableSerialization
, but this can be changed by setting the following configuration property:
io.serializations=org.apache.hadoop.io.serializer.JavaSerialization
Bear in mind however that anything that expects a Writable (Input/Output formats, partitioners, comparators) etc will need to be replaced by versions that can be passed a Serializable
instance rather than a Writable
instance.
Some more links for the curious reader:
- http://www.tom-e-white.com/2008/07/rpc-and-serialization-with-hadoop.html
- What are the connections and differences between Hadoop Writable and java.io.serialization? - Which seems to be a similar question to what you're asking, and Tariq has a good link to a thread in which Doug Cutting explains the rationale behind using Writables over Serializables