Question

i have 95904 rows(transactions) in a table transaction_records in hive

there are 9999 distinct customers i want to know top ten customers by the amount they spend I have the following fields:

txnno INT
txndate STRING
custno INT
amount DOUBLE
category STRING
product STRING

i tried using:

select custno, sum(amount) from (select txno, custno, amount, category, product from transaction_records group by custno);
FAILED: ParseException line 1:112 mismatched input '<EOF>' expecting Identifier near ')' in subquery source

This doesnt work, I am new to hiveql do you know the query for it?

Was it helpful?

Solution

Try this:

select custno, sum(amount) s from transaction_records group by custno order by s desc limit 10;

OTHER TIPS

Try the 'collect_max' UDF in Brickhouse ( http://github.com/klout/brickhouse ) . This avoids a final sort of all your users. (In your case, it might not be so bad, since there are only 10,000 customers, but for larger datasets it could be a problem )

SELECT collect_max( custno, amount, 10 )
FROM (
   SELECT custno, sum(amount) as amount
   FROM 
   transaction_records 
   GROUP BY custno ) tr;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top