Find unmatched rows between two tables dynamically

https://dba.stackexchange.com/questions/120644

29-09-2020
|

Question

I have a function here that is supposed to take two tables as arguments and check if they are the same.

create or replace function testing.equal_tables(
    varchar,
    varchar)
    returns void as
$$
begin

    execute 'select *
     from
     (select * from ' || $1 ||'
     except
     select * from ' || $2 || ') a
     union
     (select * from ' || $2 || '
     except
     select * from ' || $1 || ');'
    ;

end;
$$ language plpgsql;

When I call it with these two tables, one with 20 rows and one with 10 rows, I get only the empty set, which is not the correct result:

select testing.equal_tables('ee1', 'ee2');

When I modified the function to return a string of the statement, it came back correctly, but that still doesn't help because I'd like to execute the return string using a function, prepared statement, or something. Is there any way to make this function work?

Solution

I derive from your example that you only want to match tables with the same structure (compatible row types).

Base query

First of all, your base query is needlessly complex and possibly incorrect. Consider instead:

(TABLE a EXCEPT ALL TABLE b)  -- ALL keeps duplicate rows and is faster
UNION ALL
(TABLE b EXCEPT ALL TABLE a);

I doubt you want to fold duplicates. Typically, tables do not hold complete duplicates to begin with, and omitting the ALL key word would instruct Postgres to try and fold duplicates in every step, which is just a waste of time. Details:

Select rows which are not present in other table

Even if completely duplicate rows are possible, it might be best to return all of them to avoid misleading answers. If you want to fold duplicates in the result anyway, a single UNION does the job:

(TABLE a EXCEPT ALL TABLE b)
UNION
(TABLE b EXCEPT ALL TABLE a);

However, a query with NATURAL FULL OUTER JOIN is more efficient, doing almost the same:

SELECT *
FROM   a NATURAL FULL OUTER JOIN b
WHERE  a IS NULL OR b IS NULL;

There are two subtle, exotic corner cases:

This returns rows that are all NULL to begin with. It would be an odd table design that allows such rows, though.
This does not return duplicate rows that are not matched with the same number of duplicate rows in the other table: a single match is enough to eliminate all - which is subtly different from folding duplicates in the result! Have a look at the fiddle below. Again, it would be an odd table design that allows completely duplicate rows.

Function

What you have so far does not work for several reasons.

To return actual table rows dynamically (and not just a count or a text representation), you need to use a polymorphic type.

Since the second table is bound to have a compatible row type (as per my assumption) it's enough to hand in just the table name for that.

CREATE OR REPLACE FUNCTION f_tbl_diff(_tbl1_type ANYELEMENT, _tbl2 text)
  RETURNS SETOF ANYELEMENT AS
$func$
BEGIN
   RETURN QUERY EXECUTE format('
      SELECT *
      FROM   %1$s NATURAL FULL OUTER JOIN %2$I
      WHERE  %1$s IS NULL OR %2$I IS NULL'
    , pg_typeof(_tbl1_type), _tbl2);
END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_tbl_diff(NULL::a, 'b');

Note the special syntax for the first argument! We hand in an actual row defining the return type, not just a table name. Related answer on SO with ample details (scroll to the last chapter):

Refactor a PL/pgSQL function to return the output of various SELECT queries

This returns all rows that have no exact, complete match in the other table

SQL Fiddle also demonstrating exotic corner cases.

About the dynamic SQL:

SQL injection in Postgres functions vs prepared queries

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange