Domanda

I am trying to load JSON files into Hive using JSON Serde. I am able to get it working for one JSON file at a time, but I was wondering whether it's possible to have more than one record in a JSON file at a time and get them loaded in one shot. To give an idea, my JSON file looks like this:

File 1

{"styles": {"style": "Deep House"}, "genres": {"genre": "Electronic"}}

File 2

{"styles": {"style": "Rock"}, "genres": {"genre": "Techno Rock"}}

I combined them to make one JSON file as follows:

{"styles": {"style": "Deep House"}, "genres": {"genre": "Electronic"}},{"styles": {"style": "Rock"}, "genres": {"genre": "Techno Rock"}}

When I load this file, only the first record is loaded. My table DDL is as below:

create table json_data (
styles struct<style: string>,
genres struct<genre: string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

I use the standard LOAD command.

LOAD DATA LOCAL INPATH '/home/user/json_data' INTO TABLE json_data;

When I query the table, there's only one record inserted.

select * from json_data;
    {"style":"Deep House"}  {"genre":"Electronic"}
    Time taken: 0.76 seconds

Am I doing something wrong here with the JSON file creation? Or is it not possible to have two records in one JSON file? Any help would be really appreciated.

Thanks, TM

È stato utile?

Soluzione

You can have multiple Json records loaded into hive table but only that each Json record should be separated by a New line character

Contents of json_data file:

{"styles": {"style": "Deep House"}, "genres": {"genre": "Electronic"}}
{"styles": {"style": "Rock"}, "genres": {"genre": "Techno Rock"}}

select * from json_data;
OK
{"style":"Deep House"}  {"genre":"Electronic"}
{"style":"Rock"}        {"genre":"Techno Rock"}

The reason is because the implementation of Json Serde expects it in that syntax. Find below the link for Jsonserde github link

https://github.com/rcongiu/Hive-JSON-Serde/blob/develop/src/main/java/org/openx/data/jsonserde/JsonSerDe.java

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top