Question

So I am implementing a mapreduce job which means I am dealing with key value pairs.

I have the variable

Iterable<FreqDataWritable> values

FreqDataWritable is an object that contains pieces of information, but for now I am only concerned with one piece of information it holds which is a String which is accessed by getFilename().

I have the following loop:

ArrayList<String> filenames = new ArrayList<String>();

for(FreqDataWritable i : values) {
    filenames.add(i.getFilename());
}

Now all I want to do is print the values in the array list filenames.

for(int i = 0; i < filenames.size(); i++) {
    System.out.println(filenames.get(i));
}

However when I do this everything in filenames is the same. The only thing printed out is a single filename printed multiple times.

My original code is more complex than this, but I simplified it for help. Anyone know how to fix this?

Thanks

Was it helpful?

Solution

I figured it out. Hadoop has an odd memory usage so when I iterated over the values the first time it was just adding the same object over and over again to the arraylist.

Instead I need to do this:

for(FreqDataWritable i : values) {
    filenames.add(new String(i.getFilename()));
}

OTHER TIPS

for(String filename : filenames) {
  System.out.println(fn);
}

Let me know if this will help?

Have you tried an iterator-based method?

Iterator i = values.iterator();
fileNames.add(i.next().getFileName());
for(i; i.hasNext();) {
    String stringI = i.next().getLast().getFileName();
    if(!stringI.equals(fileNames.get(fileNames.size() - 1)))
        fileNames.add(i.next().getLast().getFileName());
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top