
I am trying to load JSON files into Hive using JSON Serde. I am able to get it working for one JSON file at a time, but I was wondering whether it's possible to have more than one record in a JSON file at a time and get them loaded in one shot. To give an idea, my JSON file looks like this:

File 1

{"styles": {"style": "Deep House"}, "genres": {"genre": "Electronic"}}

File 2

{"styles": {"style": "Rock"}, "genres": {"genre": "Techno Rock"}}

I combined them to make one JSON file as follows:

{"styles": {"style": "Deep House"}, "genres": {"genre": "Electronic"}},{"styles": {"style": "Rock"}, "genres": {"genre": "Techno Rock"}}

When I load this file, only the first record is loaded. My table DDL is as below:

create table json_data (
styles struct<style: string>,
genres struct<genre: string>
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

I use the standard LOAD command.

LOAD DATA LOCAL INPATH '/home/user/json_data' INTO TABLE json_data;

When I query the table, there's only one record inserted.

select * from json_data;
    {"style":"Deep House"}  {"genre":"Electronic"}
    Time taken: 0.76 seconds

Am I doing something wrong here with the JSON file creation? Or is it not possible to have two records in one JSON file? Any help would be really appreciated.

Thanks, TM

도움이 되었습니까?


You can have multiple Json records loaded into hive table but only that each Json record should be separated by a New line character

Contents of json_data file:

{"styles": {"style": "Deep House"}, "genres": {"genre": "Electronic"}}
{"styles": {"style": "Rock"}, "genres": {"genre": "Techno Rock"}}

select * from json_data;
{"style":"Deep House"}  {"genre":"Electronic"}
{"style":"Rock"}        {"genre":"Techno Rock"}

The reason is because the implementation of Json Serde expects it in that syntax. Find below the link for Jsonserde github link


라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top