Create integer id columns from existing string columns (integer coding?)
-
06-02-2021 - |
Question
I have an PostgreSQL server with an existing table which has two fixed-width-non-unique-string (variable size) columns such as this:
| ID_STRING_A | ID_STRING_B |
| 'AAAA' | 'BBBB' |
| 'BBBB' | 'CCCC' |
| 'AAAA' | 'DDDD' |
Now I want to compute an integer representation for the both-column-elements and store them into additional columns. The result should look like this:
| ID_STRING_A | ID_STRING_B | ID_INT_A | ID_INT_B |
| 'AAAA' | 'BBBB' | 1 | 2 |
| 'BBBB' | 'CCCC' | 2 | 3 |
| 'AAAA' | 'DDDD' | 1 | 4 |
My frist approach based on the answers is:
Unfortunately, my update part seems to be highly iniefficient although there are indices on ID_STRING_A/B. While the query itself is done in minutes, the update part seems not to end. Here's the code:
ALTER TABLE mytable ADD COLUMN ID_INT_B integer;
ALTER TABLE mytable ADD COLUMN ID_INT_A integer;
UPDATE mytable SET ID_INT_A = g.ID_INT_A , ID_INT_B = g.ID_INT_B FROM
(
WITH T( n , s ) AS
(
SELECT ROW_NUMBER() OVER ( ORDER BY s ) , s
FROM
(
SELECT ID_STRING_A FROM mytable
UNION
SELECT ID_STRING_B FROM mytable
) AS X( s )
)
SELECT m.ctid AS id_ , m.ID_STRING_A AS ID_STRING_A , m.ID_STRING_B AS ID_STRING_B , T1.n AS ID_INT_A , T2.n AS ID_INT_B FROM mytable AS m
JOIN T AS T1 ON m.ID_STRING_A = T1.s
JOIN T AS T2 ON m.ID_STRING_B = T2.s
) AS g
WHERE mytable.ctid = g.id_
Solution
I guess you can use the ASCII function:
SELECT ID_STRING_A,ID_STRING_B
, ASCII(ID_INT_A) - 64 AS ID_INT_A
, ASCII(ID_INT_B) - 64 AS ID_INT_B
FROM ...
Perhaps the intention's more clear using:
, ASCII(ID_INT_A) - ASCII('A') + 1 AS ID_INT_A
EDIT, since the question where changed something like this is possible:
WITH T (n, s) as (
SELECT row_number() over (order by s), s
FROM (
SELECT ID_STRING_A FROM mytable
UNION
SELECT ID_STRING_B FROM mytable
) as X (s)
)
SELECT m.ID_STRING_A, m.ID_STRING_B, T1.n, T2.n
FROM mytable as m
JOIN T as T1
ON m.ID_STRING_A = T1.s
JOIN T as T2
ON m.ID_STRING_B = T2.s
EDIT, updating table
I have a gut feeling that this can be done in a simpler way, but I cross joined the cte with itself and filtered with WHERE to update both columns at once:
ALTER TABLE mytable
ADD ID_INT_A INT;
ALTER TABLE mytable
ADD ID_INT_B INT;
WITH cte (n, s) as (
SELECT row_number() over (order by s), s
FROM (
SELECT ID_STRING_A FROM mytable
UNION
SELECT ID_STRING_B FROM mytable
) as X (s)
), cte2 (n1,s1,n2,s2) as (
SELECT c1.n, c1.s, c2.n, c2.s
FROM cte c1
CROSS JOIN cte c2
)
UPDATE mytable
SET ID_INT_A = cte2.n1
, ID_INT_B = cte2.n2
FROM cte2
WHERE mytable.ID_STRING_A = cte2.s1
AND mytable.ID_STRING_B = cte2.s2
;
It should be noted that this is a 1-time operation. If you decide to add AABB later on, the enumeration will be wrong
OTHER TIPS
CREATE TEMP TABLE map (
id serial PRIMARY KEY,
str text NOT NULL
);
INSERT INTO map (str)
SELECT DISTINCT id_string_a
FROM mytab;
ALTER TABLE mytab ADD id_int_a integer;
UPDATE mytab
SET id_int_a = map.id
FROM map
WHERE mytab.id_string_a = map.str;
DROP TABLE map;
id_string_b
is left as an exercise to the reader.