Accumulo-Pig error - Connector info for AccumuloInputFormat can only be set once per job

StackOverflow https://stackoverflow.com/questions/20640642

  •  18-09-2022
  •  | 
  •  

Question

Versions:
Accumulo 1.5
Pig 0.10

Attempted:
Read/write data in/into Accumulo from Pig, using accumulo-pig.
Encountered an error - any insight into getting past this error is greatly appreciated.
Switching to Accumulo 1.4 is not an option as we are using the Accumulo Thrift Proxy in our C# codebase.

Impact:
This is currently a roadblock in our project.

Source reference:
Source code - https://git-wip-us.apache.org/repos/asf/accumulo-pig.git

Error:
In attemtping to read a dataset in Accumulo, from Pig, I am getting the following error-

org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Connector info for AccumuloInputFormat can only be set once per job

Code snippet:

DATA = LOAD 'accumulo://departments?instance=indra&user=root&password=xxxxxxx&zookeepers=cdh-dn01:2181' using org.apache.accumulo.pig.AccumuloStorage() AS (row, cf, cq, cv, ts, val);
dump DATA;
Was it helpful?

Solution

Try using the ACCUMULO-1783-1.5 branch from the same repository. The way that Pig sets up the InputFormat doesn't play nicely with how Accumulo sets up InputFormats (notably, Accumulo makes a funny assertion that you never call the same static method more than one for a Configuration).

I have been using pig 0.12 -- I doubt there's a difference in how 0.10 sets up the InputFormats as opposed to 0.12, but I'm not positive YMMV.

I just pushed a fix to the above branch that gets rid of the previously mentioned limitation on Hadoop version.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top