Frage

My mapreduce writes an avro file with the AvroKeyValueOutputFormat but I'm having some troubles to import this file into hive.

How I have to define my schema in hive to get it working?

War es hilfreich?

Lösung

you have to use the AvroSerDe described in

http://goo.gl/TwsRTd

or you have to transform your output to the RowFormat that you are using in your defined hive table (again using another mapreduce job)

Regards

Martin

Andere Tipps

The format of the AvroSerDe can be a little tricky. As long as you know the Avro schema though it tends to work wonders. Hopefully this example helps.

CREATE EXTERNAL TABLE HIVEDATA
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('avro.schema.literal'='
{
    "namespace": "originalname",
    "name": "feature_value",
    "type": "record",
    "fields": [
        {"name": "acct_id", "type": "long"},
        {"name": "feature_name", "type": ["null","string"], "default": null},
        {"name": "namespace", "type": ["null","string"], "default": null},
        {"name": "feature_value", "type": ["null","double"], "default": null}
    ]
}
')
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/hdfs/location'
;
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top