Pregunta

I have 2 tables-

bpm_agent_data  - 40 Million records , 5 Columns
bpm_loan_data  - 20 Million records, 5 Columns

Now I ran a query in Hive-

select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data where bpm_loan_data.id = bpm_agent_data.id;

which is taking long long time to complete. What should be the ideal way to write the query in HIVE so that Reducer must not take so much time.

¿Fue útil?

Solución

Found the solution for the above query, replaced where with ON

select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data ON( bpm_loan_data.id = bpm_agent_data.id);
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top