Microsoft SQL Server has what I consider a remarkably sensible function, try_cast() which returns a null if the cast is unsuccessful, rather than raising an error.

This makes it possible to then use a CASE expression or a coalesce to fall back on. For example:

SELECT coalesce(try_cast(data as int),0);

The question is, does PostgreSQL have something similar?

The question is asked to fill in some gaps in my knowledge, but there’s also the general principle that some prefer a less dramatic reaction to some user errors. Returning a null is more easily taken in one's stride in SQL than an error. For example SELECT * FROM data WHERE try_cast(value) IS NOT NULL;. In my experience, user errors are sometimes better handled if there is a plan B.

有帮助吗?

解决方案

If casting from one specific type to one other specific type is enough, you can do this with a PL/pgSQL function:

create function try_cast_int(p_in text, p_default int default null)
   returns int
as
$$
begin
  begin
    return $1::int;
  exception 
    when others then
       return p_default;
  end;
end;
$$
language plpgsql;

Then

select try_cast_int('42'), try_cast_int('foo', -1), try_cast_int('bar')

Returns

try_cast_int | try_cast_int | try_cast_int
-------------+--------------+-------------
          42 |           -1 |             

If this is only for numbers, another approach would be to use a regular expression to check if the input string is a valid number. That would probably be faster than catching exceptions when you expect many incorrect values.

其他提示

Rationale

It's hard to wrap something like SQL Server's TRY_CAST into a generic PostgreSQL function. Input and output can be any data type, but SQL is strictly typed and Postgres functions demand that parameter and return types are declared at creation time.

Postgres has the concept of polymorphic types, but function declarations accept at most one polymorphic type. The manual:

Polymorphic arguments and results are tied to each other and are resolved to a specific data type when a query calling a polymorphic function is parsed. Each position (either argument or return value) declared as anyelement is allowed to have any specific actual data type, but in any given call they must all be the same actual type.

CAST ( expression AS type ) would seem like an exception to this rule, taking any type and returning any (other) type. But cast() only looks like a function while it's an SQL syntax element under the hood. The manual:

[...] When one of the two standard cast syntaxes is used to do a run-time conversion, it will internally invoke a registered function to perform the conversion.

There is a separate function for each combination of input and output type. (You can create your own with CREATE CAST ...)

Function

My compromise is to use text as input since any type can be cast to text. The extra cast to text means extra cost (though not much). Polymorphism also adds a bit of overhead. But the moderately expensive parts are the dynamic SQL we need, the involved string concatenation and, most of all, exception handling.

That said, this little function can be used for any combination of types including array types. (But type modifiers like in varchar(20) are lost):

CREATE OR REPLACE FUNCTION try_cast(_in text, INOUT _out ANYELEMENT) AS
$func$
BEGIN
   EXECUTE format('SELECT %L::%s', $1, pg_typeof(_out))
   INTO  _out;
EXCEPTION WHEN others THEN
   -- do nothing: _out already carries default
END
$func$  LANGUAGE plpgsql;

The INOUT parameter _out serves two purposes:

  1. declares the polymorphic type
  2. also carries the default value for error cases

You wouldn't call it like in your example:

SELECT coalesce(try_cast(data as int),0);

.. where COALESCE also eliminates genuine NULL values from the source (!!), probably not as intended. But simply:

SELECT try_cast(data, 0);

.. which returns NULL on NULL input, or 0 on invalid input.

The short syntax works while data is a character type (like text or varchar) and because 0 is a numeric literal that is implicitly typed as integer. In other cases, you may have to be more explicit:

Example calls

Untyped string literals work out of the box:

SELECT try_cast('foo', NULL::varchar);
SELECT try_cast('2018-01-41', NULL::date);   -- returns NULL
SELECT try_cast('2018-01-41', CURRENT_DATE); -- returns current date

Typed values that have a registered implicit cast to text work out of the box, too:

SELECT try_cast(name 'foobar', 'foo'::varchar);
SELECT try_cast(my_varchar_column, NULL::numeric);

Comprehensive list of data types with registered implicit cast to text:

SELECT castsource::regtype
FROM   pg_cast
WHERE  casttarget = 'text'::regtype
AND    castcontext = 'i';

All other input types require an explicit cast to text:

SELECT try_cast((inet '192.168.100.128/20')::text, NULL::cidr);
SELECT try_cast(my_text_array_column::text, NULL::int[]));

We could easily make the function body work for any type, but function type resolution fails. Related:

Here's a generic try-cast, probably very slow.

CREATE OR REPLACE FUNCTION try_cast(p_in text, type regtype, out result text )
RETURNS text AS $$
  BEGIN
    EXECUTE FORMAT('SELECT %L::%s;', $1, $2)
      INTO result;
exception 
    WHEN others THEN result = null;
  END;
$$ LANGUAGE plpgsql;

 SELECT try_cast('2.2','int')::int as "2.2"
   ,try_cast('today','int')::int as "today"
   ,try_cast('222','int')::int as "222";

 SELECT try_cast('2.2','date')::date as "2.2"
   ,try_cast('today','date')::date as "today"
   ,try_cast('222','date')::date as "222";

 SELECT try_cast('2.2','float')::float as "2.2"
   ,try_cast('today','float')::float as "today"
   ,try_cast('222','float')::float as "222";

This won't accept types like varchar(20) (though we could add another parameter to accept "typemod" like 20.

this fuctions returns text bascause postgreqsl functions must have a fixed return type. so you may need an explicit cast outside of the function to coerce the result to the type you want.

With PostgreSQL, internally when a coercion fails it generates a fatal exception using ereport. That's unrecoverable in coercions.

Sample Data

Let's assume an error rate of 1/5

CREATE TABLE foo AS
  SELECT CASE WHEN x%5=0 THEN 'a' ELSE x::text END
  FROM generate_series(0,1e6) AS gs(x);

Black-listing bad data.

The normal solution to this problem is to accept liberally when you create types. That's pretty much how things work now. If you need to protect against some kind of bogus input, rather than catching in the case of failure just set it to null before it fails.

SELECT NULLIF(x, 'a')::int
FROM ( VALUES ('6'),('a'),('7') ) AS t(x);

If you want you can put that into an IMMUTABLE SQL statement too.

CREATE FUNCTION safer_but_not_totally_safe_coercion( i text )
RETURNS int AS $$
  SELECT NULLIF(i, 'a')::int;
$$ LANGUAGE sql
IMMUTABLE;

-- Inlined and fast.
SELECT safer_but_not_totally_safe_coercion(x, 'a')::int
FROM ( VALUES ('6'),('a'),('7') ) AS t(x);

You can also use regexes and whatever else you want as far as verification is concerned.

EXPLAIN ANALYZE SELECT safer_but_not_totally_safe_coercion(x) FROM foo;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..21925.02 rows=1000001 width=5) (actual time=0.025..210.685 rows=1000001 loops=1)
 Planning time: 0.173 ms
 Execution time: 240.462 ms
(3 rows)

Try/Catch

This method is way slow.

EXPLAIN ANALYZE SELECT try_cast_int(x) FROM foo;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..264425.26 rows=1000001 width=5) (actual time=0.104..7069.281 rows=1000001 loops=1)
 Planning time: 0.056 ms
 Execution time: 7151.917 ms
(3 rows)

If you need it then by all means you need it, however it wouldn't be the first tool I grab for.

许可以下: CC-BY-SA归因
不隶属于 dba.stackexchange
scroll top