Question

I would like to copy HIVE table from HIVE to HDFS. Please suggest the steps. Later I would like to use this HFDS file for Mahout Machine Learning.

I have created a HIVE table using data stored in the HDFS. Then I trasfromed the few variables in that data set and created a new table from that. Now I would like to dump the HIVE table from HIVE to HDFS. So that it can be read by Mahout.

When I type this

hadoop fs -ls -R /user/hive/

I can able to see the list of table I have created ,

drwxr-xr-x   - hdfs supergroup          0 2014-04-25 17:00 /user/hive/warehouse/telecom.db/telecom_tr
-rw-r--r--   1 hdfs supergroup    5199062 2014-04-25 17:00 /user/hive/warehouse/telecom.db/telecom_tr/000000_0

I tried to copy the file from Hive to HDFS,

hadoop fs -cp /user/hive/warehouse/telecom.db/telecom_tr/* /user/hdfs/tele_copy

Here I was excepting tele_copy should be a csv file, stored in hdfs.

But when I do hadoop fs -tail /user/hdfs/tele_copy I get the below result.

7.980.00.00.0-9.0-30.00.00.670.00.00.00.06.00.06.670.00.670.00.042.02.02.06.04.0198.032.030.00.03.00.01.01.00.00.00.01.00.01.01.00.00.00.01.00.00.00.00.00.00.06.00.040.09.990.01.01
32.64296.7544.990.016.00.0-6.75-27.844.672.3343.334.671.3331.4725.05.3386.6754.07.00.00.044.01.01.02.02.0498.038.00.00.07.01.00.00.00.01.00.00.01.00.00.00.00.00.01.01.01.00.01.00.00.03.00.010.029.991.01.01
30.52140.030.00.250.00.0-42.0-0.520.671.339.00.00.034.6210.677.3340.09.332.00.00.040.02.02.01.01.01214.056.050.01.05.00.00.00.00.00.00.01.00.01.01.00.00.01.01.00.00.01.00.00.00.06.00.001.00.00.01.01
60.68360.2549.990.991.250.038.75-10.692.331.6715.670.00.0134.576.00.0102.6729.674.00.00.3340.02.01.08.03.069.028.046.00.05.00.01.00.00.00.00.00.01.01.01.00.00.00.01.00.00.01.00.00.00.02.00.020.0129.990.01.01

Which is not comma separated.

Also received the same result I received after running this command.

INSERT OVERWRITE DIRECTORY '/user/hdfs/data/telecom' SELECT * FROM telecom_tr;

When I do a -ls

drwxr-xr-x   - hdfs supergroup          0 2014-04-29 17:34 /user/hdfs/data/telecom
-rw-r--r--   1 hdfs supergroup    5199062 2014-04-29 17:34 /user/hdfs/data/telecom/000000_0

When I do a cat the result is not a CSV

Was it helpful?

Solution

What you're really asking is to have Hive store the file as a CSV file. Try using ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' see Row Format, Storage Format, and SerDe.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top