Why do I need to cast NULL to column type?

https://dba.stackexchange.com/questions/228046

21-01-2021
|

Question

I've got a helper that's generating some code to do bulk updates for me and generates SQL that looks like this:

(Both the active and core fields are of type boolean)

UPDATE fields as t set "active" = new_values."active","core" = new_values."core"
FROM (values 
(true,NULL,3419),
(false,NULL,3420)
) as new_values("active","core","id") WHERE new_values.id = t.id;

However it fails with:

ERROR: column "core" is of type boolean but expression is of type text

I can get it to work by adding ::boolean to the nulls, but that just seems odd, why is NULL considered of type TEXT?

Also it's a bit tricky to cast because it would require quite a bit of a rework of the code for it to know what type it should cast NULLs to (the list of columns and values is currently being autogenerated from a simple array of JSON objects).

Why is this necessary and is there a more elegant solution that doesn't require the generating code to know the type of NULLs?

If it's relevant, I'm using sequelize over Node.JS to do this, but am also getting the same result in the Postgres command line client.

Solution

This is an interesting finding. Normally, a NULL has no assumed data type, as you can see here:

SELECT pg_typeof(NULL);

 pg_typeof 
───────────
 unknown

This changes when a VALUES table comes into the picture:

SELECT pg_typeof(core) FROM (
    VALUES (NULL)
) new_values (core);

 pg_typeof 
───────────
 text

This behaviour is described in the source code at https://doxygen.postgresql.org/parse__coerce_8c.html#l01373:

 /*
  * If all the inputs were UNKNOWN type --- ie, unknown-type literals ---
  * then resolve as type TEXT.  This situation comes up with constructs
  * like SELECT (CASE WHEN foo THEN 'bar' ELSE 'baz' END); SELECT 'foo'
  * UNION SELECT 'bar'; It might seem desirable to leave the construct's
  * output type as UNKNOWN, but that really doesn't work, because we'd
  * probably end up needing a runtime coercion from UNKNOWN to something
  * else, and we usually won't have it.  We need to coerce the unknown
  * literals while they are still literals, so a decision has to be made
  * now.
  */

(Yes, PostgreSQL source code is relatively easy to understand and most places, thanks to excellent comments.)

The way out, however, might be the following. Let's say you are always generating VALUES that match all columns of a given table (see the second note below for other cases). From your example, a small trick could possibly help:

SELECT (x).* FROM (VALUES ((TRUE, NULL, 1234)::fields)) t(x);

 active │ core │  id  
────────┼──────┼──────
 t      │      │ 1234

Here you use row expressions casted to the table's type, and then extracting them back to a table.

Based on the above, your UPDATE could look like

UPDATE fields AS t set active = (x).active, core = (x).core
FROM ( VALUES
           ((true, NULL, 3419)::fields),
           ((false, NULL, 3420)::fields)
     ) AS new_values(x) WHERE (x).id = t.id;

Notes:

I removed the double quotes for better human readability, but you can keep them as they help when generating (column) names.
if you need only a subset of the columns, you can create custom types for this purpose. Use them the same way as you would above (where I use the type automatically created with the table, holding the row structure of the latter).

Look at the whole thing working on dbfiddle.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange