Question

I have a requirement in which we need to customize the way we load a file in pig using avrostorage:

For example, i have an avro file with the following schema:

{"namespace": "avroColorCount",
 "type": "record",
 "name": "User2",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "content", "type" :  "bytes" }
 ]
}

Now if i use the below command its working fine:

x = load 'sample.avro' USING AvroStorage() AS (name: chararray, content: bytearray);

But, if i want only the 'content'(second column) to be loaded, how can i do that?

If i give,

x = load 'sample.avro' USING AvroStorage() AS (content: bytearray);

it gives me error:

ERROR 1031: Incompatable schema: left is "content:bytearray", right is "name: chararray, content: bytearray"

I know this can be done by FILTER.

But our requirement is to get the second column alone in a single step.

Is this possible?

Thanks in advance...

Was it helpful?

Solution

The below code solved it..

x = LOAD 'sample.avro' USING AvroStorage('{"type":"record","name":"User2","fields":[{"name":"content","type":"bytearray"}]}');
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top