Try this:
select custno, sum(amount) s from transaction_records group by custno order by s desc limit 10;
Question
i have 95904 rows(transactions) in a table transaction_records
in hive
there are 9999 distinct customers i want to know top ten customers by the amount they spend I have the following fields:
txnno INT
txndate STRING
custno INT
amount DOUBLE
category STRING
product STRING
i tried using:
select custno, sum(amount) from (select txno, custno, amount, category, product from transaction_records group by custno);
FAILED: ParseException line 1:112 mismatched input '<EOF>' expecting Identifier near ')' in subquery source
This doesnt work, I am new to hiveql do you know the query for it?
Solution
Try this:
select custno, sum(amount) s from transaction_records group by custno order by s desc limit 10;
OTHER TIPS
Try the 'collect_max' UDF in Brickhouse ( http://github.com/klout/brickhouse ) . This avoids a final sort of all your users. (In your case, it might not be so bad, since there are only 10,000 customers, but for larger datasets it could be a problem )
SELECT collect_max( custno, amount, 10 )
FROM (
SELECT custno, sum(amount) as amount
FROM
transaction_records
GROUP BY custno ) tr;