Serialization in Hadoop - Writable

https://stackoverflow.com/questions/17228400

01-06-2022
|

Question

This is the class that implements Writable ..

public class Test implements Writable {
    List<AtomicWritable> atoms = new ArrayList<AtomicWritable>();

    public void write(DataOutput out) throws IOException {
        IntWritable size = new IntWritable(atoms.size());
        size.write(out);
        for (AtomicWritable atom : atoms)
            atom.write(out);
    }

    public void readFields(DataInput in) throws IOException {
        atoms.clear();
        IntWritable size = new IntWritable();
        size.readFields(in);
        int n = size.get();
        while(n-- > 0) {
            AtomicWritable atom = new AtomicWritable();
            atom.readFields(in);
            atoms.add(atom);
        }
    }
}

I will really appreciate if one can help me understand how to invoke write and readFields method. Basically I m failing to understand how to construct Test object in this case. Once the object is written to DataOutput obj, how do we restore it in DataInput object. This may sound silly, but am a newbie to Hadoop and have been assigned a project that uses Hadoop. Please help.

Thanks!!!

Solution

Basically I m failing to understand how to construct Test object in this case.

Yup, you're missing the point. If you need to construct an instance of Test and populate atoms, then you need to add a constructor to Test:

public Test(ArrayList<AtomicWritable> atoms) {
     this.atoms = atoms;
}

or you need to use the default constructor and add a method or a setter that lets you add items to atoms or set the value of atoms. The latter is actually pretty common in the Hadoop framework, to have a default constructor and a set method. cf., e.g., Text.set.

You don't call readFields and write; the Hadoop framework does that for you when it needs to serialize and deserialize inputs and outputs to and from map and reduce.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow