What are Identity Columns?

https://dba.stackexchange.com/questions/158988

05-10-2020
|

Question

I was reviewing the commit-fest scheduled for 7/01 for PostgreSQL and I saw that Pg is likely going to get "identity columns" sometime soon.

I found some mention in information_schema.columns but nothing much

is_identity         yes_or_no         Applies to a feature not available in PostgreSQL
identity_generation character_data    Applies to a feature not available in PostgreSQL
identity_start      character_data    Applies to a feature not available in PostgreSQL
identity_increment  character_data    Applies to a feature not available in PostgreSQL
identity_maximum    character_data    Applies to a feature not available in PostgreSQL
identity_minimum    character_data    Applies to a feature not available in PostgreSQL
identity_cycle      yes_or_no         Applies to a feature not available in PostgreSQL

The Wikipedia Page doesn't say much either

An identity column differs from a primary key in that its values are managed by the server and usually cannot be modified. In many cases an identity column is used as a primary key; however, this is not always the case.

But, I don't see anything else on them. How do identity columns work? Do they provide any new functionality or is this just a standard method to create sequences? Any breakdown of the new feature and how it works?

Solution 2

How they're actually implemented in PG 10

You can see how they're actually implemented now using the test suite's expected output.

Some keys to take away from this.

You can specify where to start and how many to skip with a clause on table creation or through ALTER TABLE, START 7 INCREMENT BY 5

Inserting into a table with an identity column can now OVERRIDING USER VALUE for the identity column which forces a replacement of the conflicting row:

INSERT INTO t OVERRIDING USER VALUE VALUES (10, 'xyz');

-- this isn't currently allowed.
CREATE TABLE t ( a serial PRIMARY KEY, b text );
INSERT INTO t (a,b) VALUES (1,'foo');
INSERT INTO t (a,b) VALUES (1,'bar');

You can specify GENERATED ALWAYS to ensure generation then you need only have OVERRIDING SYSTEM VALUE to ignore that or you'll get an error when you INSERT a row that specifies a value for an identity column.
```
ERROR:  cannot insert into column "a"
DETAIL:  Column "a" is an identity column defined as GENERATED ALWAYS.
HINT:  Use OVERRIDING SYSTEM VALUE to override.
```
Identity columns must be NOT NULL
Permissions propagate from the table, no more underlying sequences.
Identity can be reset entirely at a different point with RESTART

You can read more about these in the PostgreSQL 10 docs for

ALTER TABLE

ALTER [ COLUMN ] column_name ADD GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ]
ALTER [ COLUMN ] column_name DROP IDENTITY [ IF EXISTS ]
ALTER [ COLUMN ] column_name { SET GENERATED { ALWAYS | BY DEFAULT } | SET sequence_option | RESTART [ [ WITH ] restart ] } [...]

CREATE TABLE

GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ]

CREATE SEQUENCE providing the sequence_option mentioned above

OTHER TIPS

This is to implement the feature found in the standard. (copied from a draft, date: 2011-12-21):

4.15.11 Identity columns

The columns of a base table BT can optionally include not more than one identity column. The declared type of an identity column is either an exact numeric type with scale 0 (zero), INTEGER for example, or a distinct type whose source type is an exact numeric type with scale 0 (zero). An identity column has a start value, an increment, a maximum value, a minimum value, and a cycle option. ...
... The definition of an identity column may specify GENERATED ALWAYS or GENERATED BY DEFAULT.

It is a property of a column which basically says that the values for the column will be provided by the DBMS and not by the user and in some specific manner and restrictions (increasing, decreasing, having max/min values, cycling if the max/min value is reached).

Sequence generators (usually called just "sequences") are a related SQL standard feature: it's a mechanism that provides such values - and can be used for identity columns.

Note the subtle difference: a SEQUENCE is an object that can be used to provide values for one or more identity columns or even at will.

The various DBMS have so far implemented similar features in different ways and syntax (MySQL: AUTO_INCREMENT, SQL Server: IDENTITY (seed, increment), PostgreSQL: serial using SEQUENCE, Oracle: using triggers, etc) and only recently added sequence generators (SQL Server in version 2012 and Oracle in 12c).

Up to now Postgres has implemented sequence generators (which can be used to provide values for column, either with the special macros serial and bigserial or with nextval() function) but has not yet implemented the syntax for identity columns, as it is in the standard.

Defining identity columns (and the slight difference from serial columns) and various syntax (eg. GENERATED ALWAYS, NEXT VALUE FOR, etc) from the SQL standard is what this feature is about. Some changes / improvements may need to be done on the implementation of sequences as well, as identity columns will be using sequences.

If you follow the link identitity columns (from the page you saw), you'll find:

identity columns

From: Peter Eisentraut
To: pgsql-hackers Subject: identity columns
Date: 2016-08-31 04:00:42
Message-ID: 6adbacbf-73bc-dd1a-2033-63409180fd18@2ndquadrant.com

Here is another attempt to implement identity columns. This is a standard-conforming variant of PostgreSQL's serial columns. It also fixes a few usability issues that serial columns have:

need to set permissions on sequence in addition to table (*)

CREATE TABLE / LIKE copies default but refers to same sequence

cannot add/drop serialness with ALTER TABLE

dropping default does not drop sequence

slight weirdnesses because serial is some kind of special macro

(*) Not actually implemented yet, because I wanted to make use of the NEXT VALUE FOR stuff I had previously posted, but I have more work to do there.

...

Update 2017, September: seems like the feature will be in Postgres 10, which is to be released in a few days/weeks: What's New In Postgres 10: Identity Columns

Oracle have also implemented identity columns and sequences, in version 12c. The syntax is according to the standard, as far as I checked:
Identity Columns in Oracle Database 12c Release 1 (12.1)

The 12c database introduces the ability define an identity clause against a table column defined using a numeric type. The syntax is show below.
GENERATED
[ ALWAYS | BY DEFAULT [ ON NULL ] ]
AS IDENTITY [ ( identity_options ) ]

SQL 2003 Summary

For a good review of the spec on this here is a column entitled, SQL:2003 Has Been Published from SIGMOD

Identity Columns While sequence generators put SQL in charge of generating unique values, users are still burdened with tasks such as creating a sequence generator and invoking the NEXT VALUE FOR function at appropriate times. SQL:2003 provides another new feature, identity columns, that provides a more convenient mechanism by making it unnecessary for users to perform these additional tasks. Identity columns are columns designated with the special keyword IDENTITY, as shown below:
CREATE TABLE PARTS (
PARTNUM INTEGER GENERATED ALWAYS
AS IDENTITY (START WITH 1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 10000
NO CYCLE),
DESCRIPTION VARCHAR (100),
QUANTITY INTEGER )
As can be seen from the above example, identity columns share the same attributes as sequence generators. This is because a sequence generator that inherits the identity column attributes gets associated conceptually with each identity column. Note that at most one column in a table can be designated as an identity column. Users do not need to specify a value for an identity column whenever a new row is inserted into a table containing that identity column. The value for such a column is generated automatically by invoking the NEXT VALUE FOR function implicitly under the covers. For example, the following INSERT statement:
INSERT INTO PARTS
(DESCRIPTION, QUANTITY)
VALUES ('WIDGET', 30)
adds a new part named WIDGET to the PARTS table. The value for the PARTNUM column is generated automatically, following the same rules that are used for generating values of sequence generators. That is, the value of PARTNUM column for the first row would be the START WITH value specified for that column, and the values for subsequent rows would follow the formula we described previously for sequence generators. What we said above is true if the user had specified GENERATED ALWAYS for the identity column, which is the case for PARTNUM column in our example. SQL:2003 provides another option, GENERATED BY DEFAULT. If the user chooses this option, automatic generation takes place only when values are not provided in the VALUES clause. This feature is very useful for making copies of tables with identity columns.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange