Question

I'm trying to find some outliers on my database using HIVE and I'm using Standard Deviation technique. My query is:

SELECT ID
FROM data
WHERE ID < (AVG(ID) + STDDEV(ID))
  AND ID > (AVG(ID) - STDDEV(ID));

When I run this code I'm getting the following error:

 Error while compiling statement: FAILED: SemanticException [Error 10128]: Line 3:12 Not yet supported place for UDAF 'AVG'

How to solve this problem? Many thanks!

Was it helpful?

Solution

Seems like Hive doesn't let you use avg in a where clause. You can solve this with a subquery.

SELECT id
FROM 
    (SELECT id, AVG(id) as avg_id, STDDEV(id) as stddev_id FROM data)
WHERE id < avg_id + stddev_id AND id > avg_id - stddev_id
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top