Question

If I run the following SQL query

SELECT * 
FROM A
LEFT JOIN B
ON A.foo=B.foo
WHERE A.date = "Yesterday"

Does the WHERE statement get evaluated before or after the JOIN?

If after, what would be a better way to write this statement so that returns only rows in A from "Yesterday" are joined to B?

Was it helpful?

Solution

It depends on the database.

On SQL Server, run: SET SHOWPLAN_ALL ON then run the query, you will get an idea of what happens when it runs.

OTHER TIPS

Your idea of "evaluation" is not correct as SQL is a declarative language.

BTW you can see the query execution plan. In MySQL prefix your query with keyword describe to see the execution plan.

Semantically: After the JOIN. But in this case, there is no difference in timing, because it's on the LEFT side of the JOIN.

As you already have it, "only rows in A from "Yesterday" are joined to B".

The optimizer is free to reorganize its order of operations depending on the equivalences in the relational algebra.

This returns only A.date="Yesterday" and joins B where it can find a match on foo:

SELECT * FROM A
LEFT JOIN B
    ON A.foo=B.foo
WHERE A.date="Yesterday"

This returns all A regardless of any criteria and joins B where A.date="Yesterday" AND it finds a match on foo:

SELECT * FROM A
LEFT JOIN B
    ON A.foo=B.foo
    AND A.date="Yesterday"

The order of operations to satisfy a query is determined why the whim of the particular database's query optimizer. A query optimizer tries to product a good "query plan" (set of operations) based on what it can glean from the query and whatever statistics it has on hand about the database (which could include the cardinality of tables and certain distributions of data).

In your case, the answer may depend on whether you have a secondary index on A.date

Query optimization a fairly rich topic. The documentation for whatever database you're using will have a lot more to say about it.

Depends on indexes and statistics.

You should show the execution path of the query to determine where (if any) optimizations should be applied.

in SQL Server:

As a general rule of thumb, JOIN clauses are evaluated before WHERE clauses.

In case of complex joins that need filters in the join part, I write them along with my join

SELECT *
FROM A
LEFT JOIN B
    ON A.Foo1 = B.Foo1
    And A.Date = 'Yesterday'
OUTER JOIN C
    ON B.Foo2 = C.Foo2
JOIN D
    ON B.Foo3 = D.Foo3
SELECT * 
FROM (SELECT * FROM A WHERE Date = 'Yesterday') A
LEFT JOIN B 
    ON A.Foo1 = B.Foo1 
OUTER JOIN C 
    ON B.Foo2 = C.Foo2 
JOIN D 
    ON B.Foo3 = D.Foo3 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top