Hive not creating CSV file correctly

https://stackoverflow.com/questions/15110149

15-03-2022
|

Question

I am trying to export the Hive results to a file located on Amazon s3.

But the result file has some unrecognized characters like square etc.

The type of the result file format is binary/octet-stream and not csv.

I am not getting whey it is not able to create a csv file.

The version of hive used is hive-0.8.1.

I am putting the steps I followed below.

By the way the hive is used from an instance launched by Amazon EMR.

 create table test_csv(employee_id bigint, employee_name string, employee_designation string) row format delimited fields terminated by ',' lines terminated by '\n' stored as textfile;

  insert overwrite table test_csv select employee_id , employee_name , employee_designation from employee_details;

  INSERT OVERWRITE DIRECTORY 's3n://<path_to_s3_bucket>' SELECT * from test_csv;

Can you please let me know what could be the cause of this?

Solution 2

For I know, INSERT OVERWRITE DIRECTORY will always use ctrl-A('\001') as delimiter. Direct copy of file with your table data would be the best solution. GL.

OTHER TIPS

You can export data from Hive via the command line:

hive -e 'select * from foo;' > foo.tsv

You could probably pipe through sed or something to transform the tabs into commas, we just used TSVs for everything.

Did you try opening the Hive warehouse directory in HDFS to your output so as to check how the data is stored there?

I think this line is not required to be executed

INSERT OVERWRITE DIRECTORY 's3n://<path_to_s3_bucket>' SELECT * from test_csv;

rather you can directly do a "dfs -get"

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow