Join to SELECT vs. Join to Tableset

Question 1

The database engine has an optimizer to figure out the best way to execute a query. There is more under the hood than you probably imagine. For instance, when SQL Server is doing a join, it has a choice of at least four join algorithms:

Nested Loop
Index Lookup
Merge Join
Hash Join

(not to mention the multi-threaded versions of these.)

It is not important that you understand how each of these works. You just need to understand two things: different algorithms are best under different circumstances and SQL Server does its best to choose the best algorithm.

The choice of join algorithm is only one thing the optimizer does. It also has to figure out the ordering of the joins, the best way to aggregate results, whether a sort is needed for an order by, how to access the data (via indexes or directly), and much more.

When you break the query apart, you are making an assumption about optimization. In your case, you are making the assumption that the first best thing is to do a select on a particular table. You might be right. If so, your result with multiple queries should be about as fast as using a single query. Well, maybe not. When in a single query, SQL Server does not have to buffer all the results at once; it can stream results from one place to another. It may also be able to take advantage of parallelism in a way that splitting the query prevents.

In general, the SQL Server optimizer is pretty good, so you are best letting the optimizer do the query all in one go. There are definitely exceptions, where the optimizer may not choose the best execution path. Sometimes fixing this is as easy as being sure that statistics are up-to-date on tables. Other times, you can add optimizer hints. And other times you can restructure the query, as you have done.

For instance, one place where loading data into a local table is useful is when the table comes from a different server. The optimizer may not have full information about the size of the table to make the best decisions.

In other words, keep the query as one statement. If you need to improve it, then focus on optimization after it works. You generally won't have to spend much time on optimization, because the engine is pretty good at it.

Question 2

This would give the same result?

SELECT t.*, s.*
FROM dbo.TestTable AS t
JOIN dbo.TestRefTable AS s ON t.id = s.id AND s.id = 1123

Basically, this is a cross join of all records from TestTable and TestRefTable with id = 1123.

Question 3

Joining to table variables will also result in bad cardinality estimates by the optimizer. Table variables are always assumed by the optimizer to contain only a single row. The more rows it actually has the worse that estimate becomes. This causes the optimizer to assume the wrong number of rows for the table itself, but in other places, for operators that might then join to that result, it can result in wrong estimations of the number executions for that operation.

Personally I think Table parameters should be used for getting data into and out of the server conveniently using client apps (C# .Net apps make good use of them), or for passing data between Stored Procs, but should not be used too much within the proc itself. The importance of getting rid of them within the Proc code itself increases with the expected number of rows to be carried by the parameter.

Sub Selects will perform better, or immediately copying into a temp table will work well. There is overhead for copying into the temp table, but again, the more rows you have the more worth it that overhead becomes because the estimates by the optimizer get worse and worse.

Question 4

In general a derived table in the query is probably going to be faster than joining to a table variable because it can make use of indexes and they are not available in table variables. However, temp tables can also have indexes creted and that might solve the potential performance difference.

Also if the number of table variable records is expected to be small, then indexes won't make a great deal of difference anyway and so there would be little or no differnce.

As alawys you need to test on your own system as number of records and table design and index design havea great deal to do with what works best.

Question 5

I'd expect the direct Table join to be faster than the Table to TableVariable, and use less resources.