SQL Server's isNumeric() equivalent in amazon redshift

Question 1

It seems that redshift doesn't support any of the following:

where left(field1,4) like '[0-9][0-9][0-9][0-9]' 
where left(field1,4) ~ '^[0-9]{4}'
where left(field1,4) like '^[0-9]{4}'

what does seem to work is:

where left(field1,4) between 0 and 9999

this returns all rows that start with four numeric characters.

it seems that even though field1 is type string, the 'between' function interprets left(field1,4) as a single integer when the string characters are numeric (and does not give an error when they are not numeric). I'll follow up if I find a problem. For instance I don't deal with anything less than 1000, so I assume, but am not sure, that 0001 is interpreted as 1.

Question 2

Try something like:

where field1 ~ '^[0-9]{4}'

It will match any string, that starts with 4 digits.

Question 3

Although long time has passed since this question was asked I have not found an adequate response. So I feel obliged to share my solution which works fine on my Redshift cluster today (March 2016).

The UDF function is:

create or replace function isnumeric (aval VARCHAR(20000))
  returns bool
IMMUTABLE 
as $$
    try:
       x = int(aval);
    except:
       return (1==2);
    else:
       return (1==1);
$$ language plpythonu;

Usage would be:

select isnumeric(mycolumn), * from mytable
    where isnumeric(mycolumn)=false

Question 4

It looks like the code you are looking for the is the similar to function:

where left(field,4) similar to '[0-9]{4}'

Redshift doc

Question 5

Per Amazon, the posix style ~regex style expressions are slow... https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions.html

Using their own REGEXP_* functions seems to be faster. https://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html

For checking just a true/false for integers I've been using the following with success. REGEXP_COUNT(my_field_to_check, '^[0-9]+$') > 0

this returns 1 if only numeric, 0 if anything else

Question 6

where regexp_instr(field1,'^[0-9]{4}') = 0

will remove rows starting with 4 digits (the above regexp_instr will return 1 for the rows with field1 starting with 4 digits)

Question 7

We have tried the following and worked for most of our scenarios:

columnn ~ '^[-]{0,1}[0-9]{1,}[.]{0,1}[0-9]{0,}$'

This will positive, negative, integer and float numbers.

Question 8

redshift should support similar to.

WHERE field1 SIMILAR TO '[0-9]{4}%'

This reads as where field1 starts with 4 characters in the range of 0-9, then anything else.