Question

When importing data from MySQL to Hadoop Hive I need to add additional 'timestamp' field to a new table that Hive creates.

Input: MySQL table fields : Name, e-nmail, address
Output: Hive table fileds : Name, e-nmail, address, timestamp

Questions:

  1. How to request Sqoop to add a 'timestamp' field to new table that it creates to store imported data in Hive?
  2. How to modify schema (add / delete fields) of the existing table in Hive?
Was it helpful?

Solution

You can customize the data that you are selecting from the RDBMS by using the --query parameter in Sqoop. Your command might look something like this:

sqoop import \
--connect jdbc:mysql://host:port/db \
--query 'SELECT name, email, address, NOW() AS timestamp FROM table WHERE $CONDITIONS' \
--split-by name \
--hive-import \
--hive-table table \
--target-dir location
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top