Create an unique integer id column for an existing string column
-
06-02-2021 - |
Domanda
I have an PostgreSQL server with an existing table which has an fixed-width-non-unique-string column such as this:
| ID_STRING |
| 'ABCDEFG' |
| 'HIJKLMN' |
Now I want to compute integer ids for each element and store them into an additional column. The result should look like this:
| ID_STRING | ID_INT
| 'ABCDEFG' | 1
| 'HIJKLMN' | 2
| 'ABCDEFG' | 1
| 'HIJKLMN' | 2
Is there an easy way to achieve this?
Soluzione
To add the new column use:
ALTER TABLE the_table ADD COLUMN id_int integer;
To populate the new column you need an UPDATE statement.
I am assuming you have a primary key column named pk_column
in your table. Obviously you need to replace that with the actual primary key column in your table.
update the_table
set id_int = t.rn
from (
select pk_column,
dense_rank() over (order by id_string) as rn
from the_table
) t
where the_table.pk_column = t.pk_column;
If you really have a table without a primary key (why?), you can use the built-in ctid
instead:
update the_table
set id_int = t.rn
from (
select ctid as id_,
dense_rank() over (order by id_string) as rn
from the_table
) t
where the_table.ctid = t.id_;
Altri suggerimenti
Your requirement is a little difficult to understand. It seems you want a unique ID value per unique string value, but not unique across the entire data set, i.e. if you have ABCDEF multiple times in the data set, the integer value will be the same across them.
If so, you can use the DENSE_RANK() function to produce an incrementing integer id grouped based on the non-unique strings. Example below:
CREATE TABLE DataTable (NonUniqueString VARCHAR(25))
INSERT INTO DataTable
VALUES ('ABCDEF'), ('GHIJKL'), ('ABCDEF'), ('GHIJKL'), ('ABCDEF')
SELECT NonUniqueString,
DENSE_RANK() OVER (ORDER BY NonUniqueString) AS "Group"
FROM DataTable
Results:
NonUniqueString Group
-------------------------
ABCDEF 1
ABCDEF 1
ABCDEF 1
GHIJKL 2
GHIJKL 2
NOTE: The example was from MS SQL Server but the DENSE_RANK() function should behave the same in PostgreSQL and uses the same syntax.