PIG: Cannot cast java.lang.String to org.apache.avro.util.Utf8 with AvroStorage inside STORE

StackOverflow https://stackoverflow.com/questions/22488609

  •  16-06-2023
  •  | 
  •  

Question

I am using Apache PIG to reduce data originally stored in CSV format and want to output in Avro. Part of my PIG script calls a java UDF that appends a few fields to the input Tuple and passes the modified Tuple back. I am modifying the output, PIG, schema when doing this using:

Schema outSchema = new Schema(input).getField(1).schema;
Schema recSchema = outSchema.getField(0).schema;
recSchema.add(new FieldSchema("aircrafttype", DataType.CHARARRAY));

Inside the public Schema outputSchema(Schema input) method of my UDF.

Within the exec method, I append java.lang.String values to the input Tuple and return the edited Tuple to the PIG script. This, and all subsequent operations work fine. If I output to CSV format using PigStorage(',') there are no problems. When I attempt to output using

STORE records INTO '$out_dir' USING org.apache.pig.piggybank.storage.avro.AvroStorage('
{
"schema":{ 
  "type":"record", "name":"my new data",
  "fields": [
    {"name":"fld1", "type":"long"},
    {"name":"fld2", "type":"string"}
  ]}
}');

I get the following error:

java.io.IOException: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.util.Utf8

I have attempted appending the character fields to the Tuple (within my UDF) as char[] and Utf8 types, but that makes PIG angry before I even get to trying to write out data. I have also attempted modifying my Avro schema to allow for null types in every field.

I'm using PIG v0.11.1 and Avro v1.7.5, any help is much appreciated.

Était-ce utile?

La solution

This was a PIG version issue. My UDF was built into a jar-with-dependencies including PIG v0.8.1. The mix of PIG versions 0.8.1 and 0.11.1 was causing the problems, AVRO had nothing to do with it.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top