Found the solution for the above query, replaced where with ON
select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data ON( bpm_loan_data.id = bpm_agent_data.id);
Question
I have 2 tables-
bpm_agent_data - 40 Million records , 5 Columns
bpm_loan_data - 20 Million records, 5 Columns
Now I ran a query in Hive-
select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data where bpm_loan_data.id = bpm_agent_data.id;
which is taking long long time to complete. What should be the ideal way to write the query in HIVE so that Reducer must not take so much time.
Solution
Found the solution for the above query, replaced where with ON
select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data ON( bpm_loan_data.id = bpm_agent_data.id);