SQL Server Nested Query Issue

https://stackoverflow.com/questions/16303886

13-04-2022
|

Question

I am currently trying to use a nested subquery to filter rows at each layer of the nested loop. The reason I am doing this is because the queries are made by the user at application level and the order of the filters are constructed by them.

Here is an example of a query that could be created using the interface:

SELECT AVG(value) As Average, STDEV(value) As Standard_Deviation, DATEPART(mm,date) As Month 
FROM sqlTable
WHERE value IN 
    (SELECT TOP 2000 STDEV(value) FROM sqlTable WHERE value IN
        (SELECT TOP 10000 AVG(value) FROM sqlTable ORDER BY AVG(value))
    ORDER BY STDEV(value) Desc)
GROUP BY name_column1, name_column2, DATEPART(mm, Date);

There is only one table sqlTable and the only relevant columns are value, column1, column2, and date.

If the user decides they can move the STDEV function higher up in the hierarchy. So, the rows are filtered by STDEV first (or it is moved to the furthest nested query). This query isn't currently returning any results and when I add value to the select of the nested queries I get an error saying it is invalid column.

Errors:

is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.

Any help is greatly appreciated. Thank you!

EDIT: The filtering is on a single table with millions of records for a financial project and allows the users to see what commodities (column_name1 and column_name2) have been the most profitable and least risky (amongst other functions). The reason I'm using nested queries is because getting the TOP 10000 based on AVG and then out of that result returning the top 2000 based on STDEV is different then getting the TOP 10000 based on STDEV and returning the TOP 2000 out of that based on AVG. I want the user to have the ability to order the calculation however they wish and with more nested queries than just this.

SELECT AVG(value) As Average, STDEV(value) As Standard_Deviation, DATEPART(mm,Date) As Month 
FROM sqlTable
WHERE value IN 
    (SELECT TOP 2000 STDEV(value) FROM sqlTable WHERE value IN
        (SELECT TOP 10000 column_name1, column_name2, value, AVG(value) FROM sqlTable
        GROUP BY column_name1, column_name2, value ORDER BY AVG(value))
    GROUP BY column_name1, column_name2, value
    ORDER BY STDEV(value))
GROUP BY column_name1, column_name2, DATEPART(mm, Date);

returns the second error above.

Solution

When you use GROUP BY, each column in the SELECT clause must be either defined via an Aggregate function (like MAX, MIN, AVG, ...) or must be included in the GROUP BY clause.

OTHER TIPS

The query in you in statement does not make sense:

WHERE value IN 
    (SELECT TOP 2000 STDEV(value) FROM sqlTable WHERE value IN
        (SELECT TOP 10000 AVG(value) FROM sqlTable ORDER BY AVG(value))
    ORDER BY STDEV(value) Desc)

The expression is using an aggregate function (stdev()) but there is no corresponding group by. So this will return only one row. The top 2000 is unnecessary.

Perhaps you are expecting to get the stdev() from the outer group by. But it doesn't work that way. The where is processed before the group by. If you want the subsequent results, than you would want a having clause. But the having clause could not compare to value itself.

I would suggest that you ask another question that explains what the filtering and nesting is doing. As written, I can't see a particular use for this query. It is highly unlikely that a value will be in a list of standard deviations. Even if the values were really, really close, miniscule numerical difference would keep the in from working.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow