문제

I have 2 tables-

bpm_agent_data  - 40 Million records , 5 Columns
bpm_loan_data  - 20 Million records, 5 Columns

Now I ran a query in Hive-

select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data where bpm_loan_data.id = bpm_agent_data.id;

which is taking long long time to complete. What should be the ideal way to write the query in HIVE so that Reducer must not take so much time.

도움이 되었습니까?

해결책

Found the solution for the above query, replaced where with ON

select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data ON( bpm_loan_data.id = bpm_agent_data.id);
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top