Domanda

I have 2 tables-

bpm_agent_data  - 40 Million records , 5 Columns
bpm_loan_data  - 20 Million records, 5 Columns

Now I ran a query in Hive-

select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data where bpm_loan_data.id = bpm_agent_data.id;

which is taking long long time to complete. What should be the ideal way to write the query in HIVE so that Reducer must not take so much time.

È stato utile?

Soluzione

Found the solution for the above query, replaced where with ON

select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data ON( bpm_loan_data.id = bpm_agent_data.id);
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top