What does hive do behind the scenes when I load the data?

https://stackoverflow.com/questions/20455737

30-08-2022
|

Domanda

I know that hive saves data as partiotions into the hadoop file system. However, how does the exact process work when I do LOAD DATA in hive?

I appreciate your answer!

Soluzione

I know that hive saves data as partiotions into the hadoop file system.

Hive doesn't store the data. Hive can be thought as a higher level abstract on top of the MapReduce computing model.

Assuming that the data is already there in HDFS and the table is created in Hive, the LOAD DATA command will just map the data in HDFS to the table created in Hive. The mapping is stored in the Hive metastore database, which is derby by default. Here is an article about the types of Hive metastore and how to configure them.

It's a matter of just inserting/updating a couple of rows in the metastore database, which is the reason why the LOAD DATA command is fast.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow