Why is SQL's grammar inside-out?

https://stackoverflow.com/questions/959591

12-09-2019
|

Question

In just about any formally structured set of information, you start reading either from the start towards the end, or occasionally from the end towards the beginning (street addresses, for example.) But in SQL, especially SELECT queries, in order to properly understand its meaning you have to start in the middle, at the FROM clause. This can make long queries very difficult to read, especially if it contains nested SELECT queries.

Usually in programming, when something doesn't seem to make any sense, there's a historical reason behind it. Starting with the SELECT instead of the FROM doesn't make sense. Does anyone know the reason it's done that way?

Solution

The SQL Wikipedia entry briefly describes some history:

During the 1970s, a group at IBM San Jose Research Laboratory developed the System R relational database management system, based on the model introduced by Edgar F. Codd in his influential paper, "A Relational Model of Data for Large Shared Data Banks". Donald D. Chamberlin and Raymond F. Boyce of IBM subsequently created the Structured English Query Language (SEQUEL) to manipulate and manage data stored in System R. The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley aircraft company.

The original name explicitly mentioned English, explaining the syntax.

Digging a little deeper, we find the FLOW-MATIC programming language.

FLOW-MATIC, originally known as B-0 (Business Language version 0), is possibly the first English-like data processing language. It was invented and specified by Grace Hopper, and development of the commercial variant started at Remington Rand in 1955 for the UNIVAC I. By 1958, the compiler and its documentation were generally available and being used commercially.

FLOW-MATIC was the inspiration behind the Common Business Oriented Language, one of the oldest programming languages still in active use. Keeping with that spirit, SEQUEL was designed with English-like syntax (1970s is modern, compared with 1950s and 1960s).

In perspective, "modern" programming systems still access databases using the age old ideas behind

MULTIPLY PRICE BY QUANTITY GIVING COST.

OTHER TIPS

I think the way in which a SQL statement is structured makes logical sense as far as English sentences are structured. Basically

I WANT THIS
FROM HERE
WHERE WHAT I WANT MEETS THESE CRITERIA

I don't think it makes much sense, In English at least, to say

FROM HERE
I WANT THIS
WHERE WHAT I WANT MEETS THESE CRITERIA

I must disagree. SQL grammar is not inside-out.

From the very first look you can tell whether the query will SELECT, INSERT, UPDATE, or DELETE data (all the rest of SQL, e.g. DDL, omitted on purpose).

Back to your SELECT statement confusion: The aim of SQL is to be declarative. Which means you express WHAT you want and not HOW you want it. So it makes every sense to first state WHAT YOU WANT (list of attributes you're selecting) and then provide the DBMS with some additional info on where that should be looked up FROM.

Placing the WHERE clause at the end makes great sense too: Imagine a funnel, wide at the top, narrow at the bottom. By adding a WHERE clause towards the end of the statement, you are choking down the amount of resulting data. Applying restrictions to your query any place else than at the bottom would require the developer to turn their head around.

ORDER BY clause at the very end: once the data has gone through the funnel, sort it.

JOINS (JOIN criteria) really belong into the FROM clause.

GROUPING: basically running data through a funnel before it gets into another funnel.

SQL sytax is sweet. There's nothing inside out about it. Maybe that's why SQL is so popular even after so many decades. It's rather easy to grasp and to make sense out of. (Although I have once faced a 7-page (A4-size) SQL statement which took me quite a while to get my head around.)

It's designed to be English like. I think that's the primary reason.

As a side note, I remember the initial previews of LINQ were directly modeled after it (select ... from ...). This was changed in later previews to be more programming language like (so that the scope goes downwards). Anders Hejlsberg specifically mentioned this weird fact about SQL (which makes IntelliSense harder and doesn't match C# scope rules) as the reason they made this decision.

Anyhow, good or bad, it's what it is and it's too late to change anything.

The order of the clauses in SQL is absolutely logical. Remember that SQL is a declarative language, where you declare what you want and the system figures out how best to get it for you. The first clause is the select clause where you list the columns that you want in the result table. This is the primary purpose of the query. Having stated what you want the result to look like, you next state where the data should come from. The where clause limits the amount of data being returned. There is no point in thinking about how to limit your data unless you know where it comes from, so it goes after the from clause. The group by clause works with the aggregation operators in the select clause and could go anywhere after the from clause however it is better to think about aggregation on the filtered data, so it comes after the where clause. The having clause has to come after the group by clause. The order by clause is about how the data is presented and could go anywhere after the select.

It's consistent with the rest of SQL's syntax of having every statement start with a verb (CREATE, DROP, UPDATE, etc.).

The major disadvantage of having the column list first is that it's inconvenient for auto-complete (as Hejlsberg has mentioned), but this wasn't a concern when the syntax was designed in the 1970s.

We could have had the best of both worlds with a syntax like SELECT FROM SomeTable: ColumnA, ColumnB, but it's too late to change it now.

Anyhow, SQL's SELECT statement order isn't unique. It exactly matches that of Python list comprehensions:

[(rec.a, rec.b) for rec in data where rec.a > 0]

History of the language aside (although it is fascinating) I think the thing you are missing is that SQL isn't about telling the system what to do, so much as what end result you want (and it figures out how to do it)

saying 'go over there to that rack, pick up the hats with hatbands, blue hats first, then green, then red, and bring them to me' is very much telling the system how to do what you want. it's programmer think where we presume the worker is very stupid and needs minutely detailed instructions.

SQL is starting with the end result first, the data you want, the order of the columns, etc.. it's very much the perspective of someone who is building a report. "I want firstname, lastname, then age, then....." That is after all the purpose of making the request. So it starts with that, the format of the results you want. Then it goes into where you expect it to find the data, what criteria to look for, the order to present it, etc.

So as an alternative to specifying in minute detail what you want the worker to do, SQL presumes the system knows how to do that, and centers more on what you want.

So instead of pedantically telling your worker to go here, get this, bring it over there.. it's more like saying "I want hats, from rack 12, which have hatbands, and please sort them by color."

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow