Question

I am reading a book "Inside Microsoft SQL Server 2008: T-SQL Querying* which is saying by an example that when doing any joins between two tables first the Cartesian Product happen between them then it is getting filtered with the ON condition then by "RIGHT", "LEFT" or "FULL" join type.

From an example from that book,

SELECT C.customerid, COUNT(O.orderid) AS numorders
FROM dbo.Customers AS C
LEFT OUTER JOIN dbo.Orders AS O
ON C.customerid = O.customerid

Customer table has 4 rows and Orders has 7. So, first Cartesian product will generate 4*7 = 28 rows, then it will get filter by "ON" clause and LEFT OUTER.

Does that mean that irrespective of the type of join I use, every time Cartesian product is going to happen between the table? Then why we see performance difference between different joins?

Was it helpful?

Solution

SQL Server certainly doesn't calculate the cartesian product for every join and then filter it, what it does do is take your SQL statement with left, right, inner.... whatever join type you have specified, then the optimizer will make a decision based on the statistics that are present on the table on what physical join operator to use.

There are 3 physical operators:

  • Nested loops join
  • Merge Join
  • Hash Join

All 3 have their own ideal scenarios where they are best used (I'm not going to explain them here, there are loads of articles on each of these), and it mostly depends on the cardinality estimate for each table involved in the join and the statistics on how many rows the optimizer expects to get back as to which one is used.

Craig Freedman has a great series of blog posts discussing how joins work in SQL server which are all here:

Joins - Craig Freedman

I would recommend looking at the bottom 5 articles in that list, which include an introduction to joins, a summary of join properties and then reasonably in depth information on each physical join operator.

OTHER TIPS

the

any joins between two tables first the Cartesian Product happen between them then it is getting filtered with the ON condition then by "RIGHT", "LEFT" or "FULL" join type.

is only a logical description of what is done. The result will be the same as this but it will be implemented differntly depending on what indices you have and what data is in the table.

See set showplan on and then do a query and it will explain how the data is looked up. Hopefully the book will explain this as you getfurther into it.

Saying that a Cartesian Product happens and is then filtered is VERY misleading. If that were the case, then it would be virtually impossible to join 2 million row tables because, first, you'd be starting with a trillion row result set and then filtering it. Not many SQL-Server implementations could handle THAT one.

So, no, for a well-written query, a Cartesian Product is NOT the first step of the process. For a poorly written query, all bets are off. It IS possible to force SQL-Server to make that choice, but it is almost without doubt a simple example of programmer error.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top