Count NULL values per row

https://dba.stackexchange.com/questions/205887

31-12-2020
|

题

I would like to count the number of null's present per row in a table without enumerating column names. For example:

WITH t as (VALUES
  (NULL ,'hi',2,NULL,'null'),
  ('' ,'hi',2,3,'test'),
  (NULL ,'hi',2,3,'null')
)
SELECT countnulls(t)
FROM t;

Would result in:

numnulls
2
0
1

The closest I can get is with the following hack of row_to_json():

select 
(CHAR_LENGTH(row_to_json(t)::text)
 - CHAR_LENGTH(REPLACE(row_to_json(t)::text, 'null', '')))/4 from t;

Which is... quite the hack (not in a good way). It works, sort of, but it counts the string 'null' as a NULL when it is present in the actual data or in the column names. So it is incorrect in the above case.

解决方案

1. You know the column names ...

For Postgres 9.6 or later, use num_nulls()

WITH t(a, b, c, d, e) AS (
   VALUES
     (NULL ,'hi',2,NULL,'null')
   , ('' ,'hi',2,3,'test')
   , (NULL ,'hi',2,3,'null')
   )
SELECT num_nulls(a,b,c,d,e)
FROM   t;

Returns your desired result exactly, works for any mix of data types.

The manual:

num_nulls(VARIADIC "any") ... returns the number of null arguments

For Postgres 9.5 or older, convert to text[], array_remove(arr, null) and use the remaining array length for an exact count:

SELECT 5 - cardinality(array_remove(ARRAY[a::text,b::text,c::text,d::text,e::text], null))
FROM   t;

Any type can be cast to text. The cast is redundant for text columns, of course.

array_remove() requires Postgres 9.3 or later.
cardinality() requires Postgres 9.4 or later. Substitute with array_length(arr, 1) in older versions.

2. You don't know column names, but Postgres does

When building on actual tables (or other registered objects like a view or a materialized view), we can retrieve column names from the system catalog pg_attribute to fully automate with dynamic SQL. Like:

CREATE OR REPLACE FUNCTION f_num_nulls(_tbl regclass)
  RETURNS SETOF int AS
$func$
BEGIN
   RETURN QUERY EXECUTE format(
      'SELECT num_nulls(%s) FROM %s'
    , (SELECT string_agg(quote_ident(attname), ', ')  -- column list
       FROM   pg_attribute
       WHERE  attrelid = _tbl
       AND    NOT attisdropped    -- no dropped (dead) columns
       AND    attnum > 0)         -- no system columns
    , _tbl
   );
END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_num_nulls('myschema.tbl');

Returns the count for for each row in current physical order. Nothing else, to be absolutely generic.

We could also pass each row to return a single count for it using a polymorphic function. Related:

Refactor a PL/pgSQL function to return the output of various SELECT queries

3. You don't know anything: anonymous records

In the unlikely event that column names are unknown even to Postgres (like from a VALUES expression in your example), convert to a document type (json, jsonb, xml, hstore) to get a handle, like demonstrated by ypercube (comment deleted by now) and Evan.

But anonymous records do not have primary keys or any other unique attribute per definition. Count within each LATERAL subquery to defend against false aggregates. Demo with jsonb:

SELECT *
FROM  (
   VALUES
      (NULL ,'hi',2,NULL,'null')
    , (NULL ,'hi',2,NULL,'null')       -- duplicate row !!!
    , ('' ,'hi',2,3,'test')
    , (NULL ,'hi',2,3,'null')
   ) t                                 -- column names unknown
, LATERAL (
   SELECT count(*) FILTER (WHERE j.value = jsonb 'null') AS num_nulls
   FROM   jsonb_each(to_jsonb(t)) j
   ) c;

Or with json: probably a bit faster because the conversion is cheaper.
Demonstrating 3 different ways:

SELECT *
FROM  (
   VALUES
      (NULL ,'hi',2,NULL,'null')
    , (NULL ,'hi',2,NULL,'null')
    , ('' ,'hi',2,3,'test')
    , (NULL ,'hi',2,3,'null')
   ) t   -- column names unknown
, to_json(t) j
, LATERAL (
   SELECT count(*) FILTER (WHERE j1.value::text = 'null') AS num_nulls1
   FROM   json_each(to_json(t)) j1
   ) c1
, LATERAL (
   SELECT count(*) FILTER (WHERE j->>k IS NULL) AS num_nulls2
   FROM   json_object_keys(j) k
   ) c2
, LATERAL (
   SELECT count(*) - count(j->>k ) AS num_nulls3
   FROM   json_object_keys(j) k
   ) c3;

db<>fiddle here

其他提示

You can do something like this using the jsonb features, but this is a really horrible idea.

SELECT t,
  count(*) FILTER (WHERE jsonb_typeof(jrow->key) = 'null')
FROM ( VALUES
  (NULL ,'hi',2,NULL,'null'),
  ('' ,'hi',2,3,'test'),
  (NULL ,'hi',2,3,'null')
) AS t(a,b,c,d,e)
CROSS JOIN LATERAL to_jsonb(t) AS j1(jrow)
CROSS JOIN LATERAL jsonb_object_keys(jrow) AS j2(key)
GROUP BY t;
        t         | count 
------------------+-------
 ("",hi,2,3,test) |     0
 (,hi,2,3,null)   |     1
 (,hi,2,,null)    |     2
(3 rows)

You could also expand it run it all in a case statement inside a sum() which would be fast,

SELECT t,
SUM(
  CASE WHEN a IS NULL THEN 1 ELSE 0 END
  +
  CASE WHEN b IS NULL THEN 1 ELSE 0 END
  +
  CASE WHEN c IS NULL THEN 1 ELSE 0 END
  + 
  CASE WHEN d IS NULL THEN 1 ELSE 0 END
  +
  CASE WHEN e IS NULL THEN 1 ELSE 0 END
)
FROM ( VALUES
  (NULL ,'hi',2,NULL,'null'),
  ('' ,'hi',2,3,'test'),
  (NULL ,'hi',2,3,'null')
) AS t(a,b,c,d,e)
GROUP BY t;
        t         | sum 
------------------+-----
 ("",hi,2,3,test) |   0
 (,hi,2,3,null)   |   1
 (,hi,2,,null)    |   2
(3 rows)

许可以下： CC-BY-SA 和归因

不隶属于 dba.stackexchange