Select statement with "where in" returns more rows than subquery

https://stackoverflow.com/questions/17752890

03-06-2022
|

Question

Today we discovered some strange behavior with SQL Server 2012 SP1. We know how to recreate the problem, but we're not sure what's causing it. Here's a very generalized example:

Suppose you have a database, test, that has a table, table1, with the following data:

column1
-
1
2
3
4

You run the following query:

USE [test]

SELECT [column1]
FROM [test].[dbo].[table1]
WHERE [column1] IN
(SELECT TOP 5 [column1]
FROM [table1])

5 rows are returned, just as expected. Now run the following query:

USE [test]

SELECT [column1]
FROM [test].[dbo].[table1]
WHERE [column1] IN
(SELECT TOP 5 [test].[dbo].[table1].[column1]
FROM [table1])

10 rows are returned this time, even though SELECT TOP 5 [test].[dbo].[table1].[column1] FROM [table1] returns 5 rows.

Looking at the execution plan, I can tell that the sever is doing something different, but I can't tell why.

The solution seems to be to make sure we aren't being overly explicit without cause, but now that we've run into in, we want to make sure no one causes this behavior again.

Solution

Well the context of the query is what makes the difference.

In the following query, the second select statement is actually selecting a column from the first table.

SELECT [column1]
FROM [test].[dbo].[table1]
WHERE [column1] IN
(SELECT TOP 5 [test].[dbo].[table1].[column1]
FROM [table1])

If you give the table an alias you will notice different results:

SELECT [column1]
FROM [test].[dbo].[table1] table1A
WHERE [column1] IN
(SELECT TOP 5 [test].[dbo].[table1].[column1]
FROM [table1])

Also when you run the query SELECT TOP 5 [test].[dbo].[table1].[column1] FROM [table1] on its own, there is no confusion about which table you are selecting from.

Your query is equivalent to:

Select column1 from table1
where column1 in 
(select column1)

As long as the other table returns rows, the value you are selecting is always going to be equal to itself in the subquery. Its pretty much like saying Give me everyting in the set where a column value equals itself when added to another set. As long as the other set is not empty, the column value will always equal itself.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow