문제

I know that hive saves data as partiotions into the hadoop file system. However, how does the exact process work when I do LOAD DATA in hive?

I appreciate your answer!

도움이 되었습니까?

해결책

I know that hive saves data as partiotions into the hadoop file system.

Hive doesn't store the data. Hive can be thought as a higher level abstract on top of the MapReduce computing model.

Assuming that the data is already there in HDFS and the table is created in Hive, the LOAD DATA command will just map the data in HDFS to the table created in Hive. The mapping is stored in the Hive metastore database, which is derby by default. Here is an article about the types of Hive metastore and how to configure them.

It's a matter of just inserting/updating a couple of rows in the metastore database, which is the reason why the LOAD DATA command is fast.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top