Question

I have a lot of rows with city and direction fields. But from an old import, city and direction are mixed in direction field. Like:

dir number,  extra data, CITY,  AL 111111
dir number, CITY,  AL 111111
number, dir, number, CITY, dir number, CITY,  AL 111111

The important thing is that 'CITY' always comes before a US postal code and I want to extract it and save it in city field with an UPDATE (using regex?). Is it possible?

Something like:

update TABLE set city = SOME_REGEX_MAGIC_FROM_DIRECTION_FIELD
where direccion ~ 'REGEX_MAGIC'

The working SQL statement:

update TABLE
set city = substring(direction FROM '([^,]+),[^,]+$')
where direction like '%,  __ _____';
Was it helpful?

Solution

If you want the part before the last comma, one way (of many) is with a plain substring() call (regexp variant):

substring(direction FROM ',([^,]+),[^,]+$') AS city

db<>fiddle here

Your UPDATE statement might look like this:

UPDATE tbl
SET    city = substring(direction FROM ',([^,]+),[^,]+$')
WHERE  direction ~ ', *\D\D \d{5}$'

OTHER TIPS

From your data I would gather that you'll need something like this:

SELECT regexp_matches('direction_field', '([^,]+) \d{5}');

Get a substring from a regex in Redshift:

SELECT REGEXP_SUBSTR(
   'hello_uuid_092bab12-8d8b-40ad-b8b7-bc9f05e52c9c_something_else',
   '([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})'
)

Result: 092bab12-8d8b-40ad-b8b7-bc9f05e52c9c

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top