Question

In Postgres bytea values are automatically converted to text when inserted to text/varchar columns, based on the bytea_output setting.

I am working with some program code which automatically converts certain values (binary strings) in the program to a bytea format. The problem is that users might accidentally try to insert these values into a text column. By default, Postgres will allow this, but this cannot always work smoothly - for example if there are non-ASCII bytes. I think users may not realise the strange insert behaviour is due to their use of a binary string in the calling program. Therefore, if a bytea to text conversion happens, I want Posgres to raise an exception.

I am aware of CREATE CAST, but as I understand, this is a system-wide action. I do not want to change the system behaviour for other connections. I could do CREATE CAST followed by DROP CAST but this seems dirty to me as it still not contained within the connection.

How do I make (implicit) casts from bytea to text throw an exception only within the current connection?

The sql is issued automatically, so I can add a preceding SQL statement before every statement, that's no problem.

I was a little surprised by this behaviour because Postgres usually errs on the side of strictness, which I like.

This question follows from my previous question:

Was it helpful?

Solution

for example if there are non-ASCII bytes it cannot work smoothly

Why would you think so? bytea is coerced to either hex (default) or escape format when assigned to a text column. Non-ASCII characters are encoded automatically. Should work "smoothly" at all times - except that you don't want to allow it.

I could do CREATE CAST followed by DROP CAST but this seems dirty to me as it still not contained within the connection.

True, but it is contained within the transaction if you create and drop the cast within: DDL commands are fully transactional in Postgres, so only your session (within that current transaction) will ever get to see the cast.

How do I make (implicit) casts from bytea to text throw an exception only within the current connection?

... I can add a preceding SQL statement before every statement, that's no problem.

Solution with custom cast

Currently (all versions incl. Postgres 13) the cast from bytea to text has no explicit entry in the system catalog pg_cast. It is provided by basic input/output functions of the respective types. This behavior can be overruled with an explicit entry, created with CREATE CAST.

You need to be the owner of the involved types, so this basically means you have to be superuser install it.

Create this casting function once per database:

CREATE FUNCTION public.text(bytea, int, bool) 
  RETURNS text
  LANGUAGE plpgsql STABLE STRICT PARALLEL SAFE AS
$func$
BEGIN
   IF $3 THEN           -- true if the cast is an explicit cast, false otherwise.
     -- no infinite loop because we do the cast manually
     -- honors current setting for bytea_output, hence function not IMMUTABLE
      RETURN textin(byteaout($1));
   ELSE
      RAISE EXCEPTION 'Assignment cast from bytea to text forbidden by custom cast rules in this database!';
      RETURN textin(byteaout($1));  -- we should *never* get here!
   END IF;
END
$func$;

To allow creating / dropping the special cast to unprivileged roles, add wrapper functions. Do this as superuser (or as dedicated daemon role):

CREATE FUNCTION public.f_create_cast_bytea2text() 
  RETURNS void
  LANGUAGE sql SECURITY DEFINER AS
'CREATE CAST (bytea AS text) WITH FUNCTION public.text(bytea, int, bool) AS ASSIGNMENT;';

CREATE FUNCTION public.f_drop_cast_bytea2text() 
  RETURNS void
  LANGUAGE sql SECURITY DEFINER AS
'DROP CAST IF EXISTS (bytea AS text);';

Now you can do what you asked for:

BEGIN;
SELECT public.f_create_cast_bytea2text();  -- optionally activate your casting rule

INSERT INTO tbl(txt_col)
VALUES ('\000'::bytea::text, 'local bytea_output: hex');    -- explicit cast still works!

INSERT INTO tbl(txt_col)
VALUES ('\000'::bytea); -- but assignment cast forbidden! -> ERROR

SELECT public.f_drop_cast_bytea2text();  -- deactivate your casting rule
END;

db<>fiddle here -- second half does not execute due to missing privileges.

Extended test case

Test table:

CREATE TABLE test(id int, txt_col text, note text);
INSERT INTO test(id, txt_col, note) VALUES
  (-1, 'foo', 'plain text input')
, ( 0, '\000'::bytea, 'default bytea_output: ' || current_setting('bytea_output'));

No exception raised:

BEGIN;
SELECT public.f_create_cast_bytea2text();

SET LOCAL bytea_output = 'hex';
INSERT INTO test(id, txt_col, note)
VALUES (1, '\000'::bytea::text, 'local bytea_output: hex');    -- explicit cast still works

SET LOCAL bytea_output = 'escape';
INSERT INTO test(id, txt_col, note)
VALUES (2, '\000'::bytea::text, 'local bytea_output: escape'); -- explicit cast still works

SELECT public.f_drop_cast_bytea2text();
END;

Also no exception:

BEGIN;
SELECT public.f_drop_cast_bytea2text();

SELECT '\000'::bytea || text 'foo'; -- implicit cast still works

SELECT public.f_drop_cast_bytea2text();
END;

Exception raised:

BEGIN;
SELECT public.f_create_cast_bytea2text();

INSERT INTO test(id, txt_col, note)
VALUES (3, '\000'::bytea, 'must fail!'); -- assignment cast forbidden!

SELECT public.f_drop_cast_bytea2text();
END;

Client side problem?

Your comment seems to reveal a rabbit hole:

Input values must be prepared differently depending on whether they are strings with encoding or binary strings. Even if we make use of the currently active encoding. Assuming that binary strings will go into bytea and strings with encoding will go into text.

The solution cannot work at all once you incorrectly assume data type text on the client side. The cast is only invoked if you hand in typed bytea values. I.e.: use a function or prepared statement with explicit data type or append an explicit cast to string literals handed to an INSERT command like demonstrated above: '\000'::bytea.

Once you pass untyped literals, Postgres has no way to know that it should really be bytea. And how can you (incorrectly) prepare bytea strings for text input and then still (correctly?) add an explicit cast to bytea?

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top