Trailing null (\x00) characters when writing text to Accumulo

https://stackoverflow.com/questions/21173203

28-09-2022
|

Question

I am trying to write the name of a file into Accumulo. I am using accumulo-core-1.43.

For some reason, certain files seem to be written into Accumulo with trailing \x00 characters at the end of the name. The upload is coming through a Java servlet (using the jquery file upload plugin). In the servlet, I check the name of the file with a System.out.println and it looks normal, and I even tried unescaping the string with

org.apache.commons.lang.StringEscapeUtils.unescapeJava(...);

The actual writing to accumulo looks like this:

Mutation mut = new Mutation(new Text(checkSum)); 
Value val = new Value(new Text(filename).getBytes());
long timestamp = System.currentTimeMillis();
mut.put(new Text(colFam), new Text(EMPTY_BYTES), timestamp, val);

but nothing unusual showed up there (perhaps \x00 isn't escaped)? But then if I do a scan on my table in accumulo, there will be one or more \x00 in the file name.

The problem this seems to cause is that I return that string within XML when I retrieve a list of files (where it shows up) and pass that back to the browser, the the XSL that is supposed to render the information in the XML no longer works when there's these extra characters (not sure why that is the case either).

In chrome, for the response on these calls, I see that there's three red dots after the file name, and when I hover over it, \u0 pops up (which I think is a different representation of 0/null?).

Anyway, I'm just trying to figure out why this happens, or at the very least, how I can filter out \x00 characters before returning the file in Java. any ideas?

Solution

You are likely incorrectly using the Hadoop Text class -- this is not an error with Accumulo. Specifically, you make the mistake in your above example:

Value val = new Value(new Text(filename).getBytes());

You must adhere to the length of provided by the Text class. See the Text javadoc for more information. If you're using Hadoop-2.2.0, you can use the provided copyBytes method on Text. If you're on older version of Hadoop where this method doesn't yet exist, you can use something like the ByteBuffer class or the System.arraycopy method to get a copy of the byte[] with the proper limits enforced.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow