Find outliers in Hive - SemanticException
-
16-10-2019 - |
Question
I'm trying to find some outliers on my database using HIVE and I'm using Standard Deviation technique. My query is:
SELECT ID
FROM data
WHERE ID < (AVG(ID) + STDDEV(ID))
AND ID > (AVG(ID) - STDDEV(ID));
When I run this code I'm getting the following error:
Error while compiling statement: FAILED: SemanticException [Error 10128]: Line 3:12 Not yet supported place for UDAF 'AVG'
How to solve this problem? Many thanks!
Solution
Seems like Hive doesn't let you use avg
in a where clause. You can solve this with a subquery.
SELECT id
FROM
(SELECT id, AVG(id) as avg_id, STDDEV(id) as stddev_id FROM data)
WHERE id < avg_id + stddev_id AND id > avg_id - stddev_id
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange