Question

I have 2 tables-

bpm_agent_data  - 40 Million records , 5 Columns
bpm_loan_data  - 20 Million records, 5 Columns

Now I ran a query in Hive-

select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data where bpm_loan_data.id = bpm_agent_data.id;

which is taking long long time to complete. What should be the ideal way to write the query in HIVE so that Reducer must not take so much time.

Was it helpful?

Solution

Found the solution for the above query, replaced where with ON

select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data ON( bpm_loan_data.id = bpm_agent_data.id);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top