Domanda

I'm trying to load some data into Vertica. One of the columns is a floating point number in the range of 0.0 to 10.0, examples:

5.9, 3.7, 1.0, 3.2 etc.

But not: 5.93, 3.71214, 1 ...

So it always rounded to one decimal place and this is not omitted even if it's zero.

The problem is, that the decimal separator can be practically ANYTHING. Mostly its "." or "," but i have seen "/" or even ":" . After numerous tries i decided to get rid of it entirely (think of "multiplying the numbers by 10") in order to get: 59, 37, 10, 32 etc.

This is what I've written in the COPY command for this particular column:

number_source FILLER VARCHAR(4),
number as (REGEXP_REPLACE(number_source, '[^0-9]', '', 1, 0, 'b'))::NUMERIC,

The problem is, it doesn't work. After several minutes of loading data, Vertica spits out this:

vsql:load.sql:83: ERROR 3682:  Invalid input syntax for numeric: ""

If I try to use ::INTEGER instead of ::NUMERIC, I get this:

vsql:load.sql:83: ERROR 2827:  Could not convert "" to an int8

It is probably because there is a bad value (not numeric at all) and the REGEXP_REPLACE removes everything so I'm left with an empty string which causes this problem. Or it's something else I do not know. I had to use the 'b' modifier since there are garbled rows with non-UTF8 characters and it fails on those lines too.

I'm perfectly fine with losing these broken rows but Vertica always rolls back in this case and i'm left with nothing even though i did not specify "ABORT ON ERROR". Is there any way how to fix this except pre-processing the dataset before loading into Vertica?

È stato utile?

Soluzione

I was able to workaround around the empty string issue by using this:

number as CASE WHEN REGEXP_REPLACE(number_source, '[^0-9]', '', 1, 0, 'b') = '' THEN 0 ELSE (REGEXP_REPLACE(number_source, '[^0-9]', '', 1, 0, 'b'))::INTEGER END,

Basically I always check if the REGEXP_REPLACE returns an empty string and if so I insert a zero, otherwise I insert the result of the REGEXP_REPLACE. Not a perfect solution, but it works.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top