The below code solved it..
x = LOAD 'sample.avro' USING AvroStorage('{"type":"record","name":"User2","fields":[{"name":"content","type":"bytearray"}]}');
Question
I have a requirement in which we need to customize the way we load a file in pig using avrostorage:
For example, i have an avro file with the following schema:
{"namespace": "avroColorCount",
"type": "record",
"name": "User2",
"fields": [
{"name": "name", "type": "string"},
{"name": "content", "type" : "bytes" }
]
}
Now if i use the below command its working fine:
x = load 'sample.avro' USING AvroStorage() AS (name: chararray, content: bytearray);
But, if i want only the 'content'(second column) to be loaded, how can i do that?
If i give,
x = load 'sample.avro' USING AvroStorage() AS (content: bytearray);
it gives me error:
ERROR 1031: Incompatable schema: left is "content:bytearray", right is "name: chararray, content: bytearray"
I know this can be done by FILTER.
But our requirement is to get the second column alone in a single step.
Is this possible?
Thanks in advance...
Solution
The below code solved it..
x = LOAD 'sample.avro' USING AvroStorage('{"type":"record","name":"User2","fields":[{"name":"content","type":"bytearray"}]}');