How to infer the result of a query?

https://softwareengineering.stackexchange.com/questions/277670

07-10-2020
|

Pergunta

To understand what a query does, I have always thought the following is the procedure on evaluating a query:

Form the table as specified in the FROM clause.
Pick the rows from that table as specified in the WHERE clause.
Show the columns of the resulting rows, as specified in the SELECT clause.

But the problem is; all around, I see that SQL is a declarative language. That is, there is no guarantee how a particular query will be executed. Given that, my question is:

Is reasoning about the result of a query in the way I do wrong? If so, how am I supposed to reason about the result of a query? In other words, if I am not able to reason in a step by step basis, how am I supposed to predict what will the result of a particular query be?

Thing is, although this is not a big problem in trivial queries, as queries gets more complex, such as by incorporating various correlated subqueries, I can't see how am I supposed to understand what a query does, without thinking about the query in a step-by-step execution. So, I would like to learn how to reason about SQL queries.

Solução

Reasoning about a query's meaning (i.e. the results it should produce) using the naive procedural approach you describe is perfectly fine. As you say, with complex queries it's often the only easy way to work out what's going on.

The problem would be if you used this reasoning to infer properties about the query execution (such as time complexity or memory usage). That's unwise because the actual database probably uses indexes and temporary tables and query planners containing all sorts of black magic optimizations you don't and shouldn't know about. As long as you never make assumptions like "Nested subqueries use lots of memory because they construct temporary tables" without any hard evidence, you should be fine.

If you have any background in math, it might help to think of declarative statements like SQL queries as definitions of mathematical sets. For instance, the result of this query:

SELECT id, name FROM table WHERE id > 10

is arguably equivalent to a set like this:

S = { {id, name} | id > 10 ∧ ∃row (row ∈ table) ∧ row(0) = id ∧ row(1) = name }

Since set-builder notation is not normally intended as pseudocode for set construction algorithms, this formation puts some mental distance between the query/set definition and the naive algorithm it happens to imply.

Outras dicas

You are indeed able to precisely determine what a query will return by using logical reasoning as you described.

What SQL does not guarantee is how the engine will find this result. It will give the same result as if it used the logical steps you describe, but how it actually finds the result depends on a number of factors and implementation details.

For example, when you say "Pick the rows from that table..." the engine might actually not need to look a the table at all, if there is an index which includes all the columns you as for. Of course this means the engine have to look at the columns first to determine what tables or indexes is should use. So the engine will not actually follow the steps you describe, but the result will be the same.

If you take correlated subqueries, the result will look as if the correlated subquery will be executed once per row in the main query, but in reality the query optimizer might transform the query into a join which is much faster to execute.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange