Question

I'm reading CJ Date's SQL and Relational Theory: How to Write Accurate SQL Code, and he makes the case that positional queries are bad — for example, this INSERT:

INSERT INTO t VALUES (1, 2, 3)

Instead, you should use attribute-based queries like this:

INSERT INTO t (one, two, three) VALUES (1, 2, 3)

Now, I understand that the first query is out of line with the relational model since tuples (rows) are unordered sets of attributes (columns). I'm having trouble understanding where the harm is in the first query. Can someone explain this to me?

Was it helpful?

Solution

The first query breaks pretty much any time the table schema changes. The second query accomodates any schema change that leaves its columns intact and doesn't add defaultless columns.

People who do SELECT * queries and then rely on positional notation for extracting the values they're concerned about are software maintenance supervillains for the same reason.

OTHER TIPS

While the order of columns is defined in the schema, it should generally not be regarded as important because it's not conceptually important.

Also, it means that anyone reading the first version has to consult the schema to find out what the values are meant to mean. Admittedly this is just like using positional arguments in most programming languages, but somehow SQL feels slightly different in this respect - I'd certainly understand the second version much more easily (assuming the column names are sensible).

I don't really care about theoretical concepts in this regard (as in practice, a table does have a defined column order). The primary reason I would prefer the second one to the first is an added layer of abstraction. You can modify columns in a table without screwing up your queries.

You should try to make your SQL queries depend on the exact layout of the table as little as possible.

The first query relies on the table only having three fields, and in that exact order. Any change at all to the table will break the query.

The second query only relies on there being those three felds in the table, and the order of the fields is irrelevant. You can change the order of fields in the table without breaking the query, and you can even add fields as long as they allow null values or has a default value.

Although you don't rearrange the table layout very often, adding more fields to a table is quite common.

Also, the second query is more readable. You can tell from the query itself what the values put in the record means.

Something that hasn't been mentioned yet is that you will often be having a surrogate key as your PK, with auto_increment (or something similar) to assign a value. With the first one, you'd have to specify something there — but what value can you specify if it isn't to be used? NULL might be an option, but that doesn't really fit in considering the PK would be set to NOT NULL.

But apart from that, the whole "locked to a specific schema" is a much more important reason, IMO.

SQL gives you syntax for specifying the name of the column for both INSERT and SELECT statements. You should use this because:

  • Your queries are stable to changes in the column ordering, so that maintenance takes less work.
  • The column ordering maps better to how people think, so it's more readable. It's more clear to think of a column as the "Name" column rather than the 2nd column.

I prefer to use the UPDATE-like syntax:

INSERT t SET one = 1 , two = 2 , three = 3

Which is far easier to read and maintain than both the examples.

Long term, if you add one more column to your table, your INSERT will not work unless you explicitly specify list of columns. If someone changes the order of columns, your INSERT may silently succeed inserting values into wrong columns.

I'm going to add one more thing, the second query is less prone to error orginally even before tables are changed. Why do I say that? Becasue with the seocnd form you can (and should when you write the query) visually check to see if the columns in the insert table and the data in the values clause or select clause are in fact in the right order to begin with. Otherwise you may end up putting the Social Security Number in the Honoraria field by accident and paying speakers their SSN instead of the amount they should make for a speech (example not chosen at random, except we did catch it before it actually happened thanks to that visual check!).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top