Question

I have bunch of zip files of CSVs, that I want to create Hive table from. I'm trying to figure out what's the best way to do so.

  • Unzip the files, upload them to HDFS.
  • Is there a way to copy the files to HDFS, unzip the
  • Or is there any other better / recommended way?
Was it helpful?

Solution

It's common practice to convert CSV files to tab separated or Ctrl A , or Ctrl B delimited and then upload it to Hadoop/Hive.

To upload files to HDFS you can use following command -

hadoop fs -put file_to_uplload hdfs_path

I assume you would like to automate this. In that case following instructions will be helpful.

  1. Create hive table with columns mapping to CSV files fileds.(you can remove unnecessary fields at this step). Choose your delimiter in hive create table statement.

  2. Convert csv files to delimited format (Ctrl A or Ctrl B)

  3. Upload files to Hive table location.

You can automate about steps using python batch processing scripts/framework.

For further reading : http://wiki.apache.org/hadoop/Hive/GettingStarted

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top