Do subqueries add expressive power to SQL queries?
-
16-10-2019 - |
Question
Does SQL need subqueries?
Imagine a sufficiently generalized implementation of the structured query language for relation databases. Since the structure of the canonical SQL SELECT
statement is actually pretty important for this to make sense, I don't appeal directly to relational algebra, but you could frame this in those terms by making appropriate restrictions on the form of expressions.
An SQL SELECT
query generally consists of a projection (the SELECT
part) some number of JOIN
operations (the JOIN
part), some number of SELECTION
operations (in SQL, the WHERE
clauses), and then set-wise operations (UNION
, EXCEPT
, INTERSECT
, etc.), followed by another SQL SELECT
query.
Tables being joined can be the computed results of expressions; in other words, we can have a statement such as:
SELECT t1.name, t2.address
FROM table1 AS t1
JOIN (SELECT id, address
FROM table2 AS t3
WHERE t3.id = t1.id) AS t2
WHERE t1.salary > 50,000;
We will refer to the use of a computed table as part of an SQL query as a subquery. In the example above, the second (indented) SELECT
is a subquery.
Can all SQL queries be written in such a way as to not use subqueries? The example above can:
SELECT t1.name, t2.address
FROM table1 AS t1
JOIN table2 AS t2
ON t1.id = t2.id
WHERE t1.salary > 50,000;
This example is somewhat spurious, or trivial, but one can imagine instances where considerably more effort might be required to recover an equivalent expression. In other words, is it the case that for every SQL query $q$ with subqueries, there exists a query $q'$ without subqueries such that $q$ and $q'$ are guaranteed to produce the same results for the same underlying tables? Let us limit SQL queries to the following form:
SELECT <attribute>,
...,
<attribute>
FROM <a table, not a subquery>
JOIN <a table, not a subquery>
...
JOIN <a table, not a subquery>
WHERE <condition>
AND <condition>
...
AND <condition>
UNION
-or-
EXCEPT
-or-
<similar>
SELECT ...
And so on. I think left and right outer joins don't add much, but if I am mistaken, please feel free to point that out... in any event, they are fair game as well. As far as set operations go, I guess any of them are fine... union, difference, symmetric difference, intersection, etc... anything that is helpful. Are there any known forms to which all SQL queries can be reduced? Do any of these eliminate subqueries? Or are there some instances where no equivalent, subquery-free query exists? References are appreciated... or a demonstration (by proof) that they are or aren't required would be fantastic. Thanks, and sorry if this is a celebrated (or trivial) result of which I am painfully ignorant.
Solution
There is some terminology confusion; the query block within parenthesis
SELECT t1.name, t2.address
FROM table1
JOIN (SELECT id, address
FROM table2 AS t3
WHERE t3.id = t1.id)
is called inner view. A subquery is query block within either WHERE or SELECT clause, e.g.
select deptno from dept
where 3 < (select count(1) from emp
where dept.deptno=emp.deptno)
In either case, inner view or subquery can be unnested into "flat" project-restrict-join. Correlated subquery with aggregation unnests into inner views with grouping, which then unnests into flat query.
select deptno from dept d
where 3 < (select avg(sal) from emp e
where d.deptno=e.deptno)
select d.deptno from dept d, (
select deptno from emp e
group by deptno
having avg(sal) > 3
) where d.deptno=e.deptno
select d.deptno from dept d, emp e
where d.deptno=e.deptno
group by d.deptno
having avg(sal) > 3
As for algebraic rules for query optimization, relational algebra is known to be axiomatized into Relational Lattice which simplifies query transformations as demonstrated here and there.
OTHER TIPS
To translate your statement into the relational algebra, I think it asks:
Can we rewrite $\sigma_A(A)\bowtie \sigma_B(B)$ as $\sigma_A(\sigma_B(A\bowtie B))$?
(Where $\sigma$ is select and $\bowtie$ is join.)
The answer is "Yes," and it's a standard query optimization. To be honest, I'm not sure how to prove this in a non-question-begging manner - it's just a property of selection and join. You can argue inductively to add however many layers of nested queries you want.
Additionally, you might ask:
Can every sequence of joins be written as $A\bowtie B\bowtie C\bowtie\dots$ [As opposed to, say, $(A\bowtie B)\bowtie(C\bowtie D)$]?
Again the answer is yes, because join is associative. Similar statements can be made about projection as well.
One notable type of "subquery" which I think cannot be "flattened out" is with
. One way to see this is to note that if you have a with
statement then you can have a recursive function, which cannot be written without using subqueries.
So to sum up: in the specific case you mentioned, no, SQL does not need subqueries, and you can prove it inductively. In general though, there are features which require subqueries.
"Do subqueries add expressive power to SQL queries?"
They did, at least before the introduction of EXCEPT in the SQL language.
Prior to the introduction of EXCEPT, there was no way what so ever to express a relational difference or semidifference in SQL without resorting to subqueries.
These days, all of the "typical" primitive operators of "the" relational algebra can be expressed without subqueries :
NATURAL JOIN can be done through NATURAL JOIN, or JOIN ON
UNION can be done through UNION
MINUS can be done through EXCEPT
PROJECT/RENAME/EXTEND can be done throug SELECT
RESTRICT can be done through WHERE
relational literals can be done through VALUES
transitive closures can be done through recursive WITH