Redshift throws Value too long for character type character varying(100) even when values are within the range

dba.stackexchange https://dba.stackexchange.com/questions/243277

  •  06-02-2021
  •  | 
  •  

Question

I am aware what this error message Value too long for character type character varying(100) means. So I often look for the rows which cause the trouble and fix them appropriately as deemed fit by the requirement.

But I encountered an odd issue today where the error happens even if there isn't a rough row.

Failing insert query:

INSERT INTO training.archive_temp1 (id, booking, email, pcd_temp, property_id)
WITH x_pcd AS (
    SELECT e.id,
        e.booking,
        e.email,
        CASE
            WHEN LENGTH(e.pch) > 0 THEN (e.pch || ':' || e.pcd)
            ELSE e.pcd
        END AS pcd_temp,
        e.pcd
    FROM public.extracts_temp AS e WHERE e.id BETWEEN 274939128 AND 275083166
)
SELECT x.id,
    x.booking,
    x.email,
    x.pcd_temp,
    COALESCE(c2.property_id, c.property_id)
FROM x_pcd AS x
         LEFT JOIN public.property_codes AS c ON x.pcd_temp = c.code
         LEFT JOIN public.property_codes AS c2 ON x.pcd = c2.code
WHERE COALESCE(c2.property_id,c.property_id, 0) <> 0;

If I change x.email to x.email::varchar(100) it works.

Here's the catch.

SELECT max(length(email)) FROM training.archive_temp1;
-- returns 64

Weird. So I checked

SELECT max(length(email)) FROM (
SELECT e.id,
        e.booking,
        e.email,
        CASE
            WHEN LENGTH(e.pch) > 0 THEN (e.pch || ':' || e.pcd)
            ELSE e.pcd
        END AS pcd_temp,
        e.pcd
    FROM public.extracts_temp AS e WHERE e.id BETWEEN 274939128 AND 275083166
)
-- returns 66

If no row is crossing the character limit of 100, why does it throw the error? What is happening here?

If you need me to share results from any of your queries, please let me know. Since the rows are in 100000 range, can't share the entire data here and if I could share a minimum verifiable example for the case, I wouldn't be asking this question.

Was it helpful?

Solution

Redshift can store multi byte strings into varchar field. But if you define your field as varchar(100) it does not mean 100 characters. Instead it means 100 bytes. So if all the characters in the string are two byte characters, then the field can store at max 50 characters.

From the documentation,

Use a VARCHAR or CHARACTER VARYING column to store variable-length strings with a fixed limit. These strings are not padded with blanks, so a VARCHAR(120) column consists of a maximum of 120 single-byte characters, 60 two-byte characters, 40 three-byte characters, or 30 four-byte characters.

The problem is LENGTH function only returns the number of characters and not number of bytes excluding the trailing blanks. So getting the length on a multi byte character only returns 1. This is documented here.

The alternative OCTET_LENGTH can return the number of bytes instead of number of characters.

Running OCTET_LENGTH revealed trouble maker and it is fixed now.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top