Identity Columns or UDF that explicitly generates a unique id?

https://dba.stackexchange.com/questions/1632

16-10-2019
|

Question

I am in middle of a debate about whether it is better to make a PRIMARY KEY out of an Identity Columns, our out of a UDF that explicitly generates a unique id.

I am arguing for the Identity Column.
My partner is arguing for generating the values manually, he claims
- by putting the UDF on another table where we can have a UDF
  - lock the resource
  - increment an ID table with one field called ID_Value by 1
  - use this as a global unique identifier
- Or have the table do an id+1 when inserting
- That it's simpler to move data between servers and/or environments not having the identify constraint; moving from one DB where there is data to another similar DB with lets say staging or dummy data. For testing in non production we may want to pull all records from yesterday down to staging for testing.

Which implementation makes more sense?

Solution

Your colleague is an idiot.

The solution won't be scalable, the UDF isn't concurrent (same reason as this). And how do you deal with multi-row inserts: this would require a UDF call per row

And migrating to other RDBMS doesn't happen often in real life... you may as well not use SQL Server now and use sequences on Oracle and hope you don't migrate away.

Edit:

Your update states that moving data is for refreshing non-production databases.

In that case, you ignore the identity columns when refreshing. You don't compromise your implementation to make non-prod loading easier. Or use temp tables to track the identity value changes.

Or use processes: we refresh our test system every night from production which avoids the issue entirely. (And ensures our prod backup can be restored too)

OTHER TIPS

Use an identity value. Generating your own sequence table and sequence values will take a lot of overhead and cause a lot of locking and blocking while trying to generate numbers.

Identity exists for a reason, use it.

When SQL Denali comes out it will support sequences which will be more efficient than identity, but you can't create something more efficient yourself.

As for moving records from one environment to another either turn IDENTITY_INSERT ON when doing the insert, or check the box in SSIS.

The identity column sounds fine to me. I'm not sure I follow the logic about why it's difficult to move data between servers.

If you do want each row to have a globally unique identity you can use a UUID but I wouldn't do it unless you are sure that the global uniqueness is necessary - usually it's not. Using UUIDs as ids will decrease performance, increase disk space requirements and make debugging harder - because of the length it is difficult to remember a UUID, tell it to someone over the phone, or write it down on paper without error.

For simple numeric IDs, just go with identity and forget all the problems of manually generating them.

You can always create a "super table" that uses an identity as the PK and have a type column, and any other info. When you need a new ID (assuming you mean unique IDS across different tables) just insert into this table and grab the SCOPE_IDENTITY() and then insert into the actual table you need.

Basically you create a table: MasterIDs with an identity PK, when you need to insert a row into your Table1, INSERT INTO MasterIDs and get the identity generated by that row using SCOPE_IDENTITY() and then insert into Table1 using that value as the PK.

Table1 will have a non-identity int PK. You would do the same process to insert into Table2, etc. Let SQL Server manage the identity values in the MasterIDs table, which you can then use in your other tables. MasterIDs could contain other tables, like type (so you could know what table, Table1 or Table2, etc, uses that identity value.

As long asyou are using foreign key constraints properly (cascading, updating, etc) then you will be fine with using an identity field. I really don't see an advantage to the other solution in this case.

Identity was made to fit your scenario. You have tools like replication for server/environments data exchange that keep it all together.

I have just finished a piece of work where I have replaced a SQL Server identity column with a normal int field and controlled Id allocation myself.

I have seen quite impressive performance gains. Unlike the OP, I haven't got a UDF to generate the id. But the principle is pretty much the same: There is part of the software which maintains a pool of id's. When they run out it gets another batch by querying the database for the next Low value and increments this to the next High.

This allows us to generate ids and relate all entities outside of a transaction in our ORM before we submit the batches to the database and also submit larger batches without extra roundtrips to get the identity just inserted (required by identity columns).

In the id table we have there is more than one row, allowing us to use specific ranges if we wish. i.e. for reusing deleted blocks and negative ids.

I've been using identity for years and seriously considering replacing identity number with UNIQUEIDENTIFIER. It's a nightmare when you need to change the data type if someone has designed it to be a compact db and nightmare if you need to add identity to a column, also, you can't update identity column. Imagine you put an int and your database grows beyond 2billion records, again nightmare to change (consider FKs)! Changing anything with identity is a nightmare and is not scale friendly unless you put bigint! UNIQUEIDENTIFIER vs Identity = convenience and robustness vs maybe noticeable performance improvement (didn't do the benchmark).

Update: After I've seen this I definitely lean toward UNIQUEIDENTIFIER. This shows no real benefit of bigint identity and a bunch of benefits for UNIQUEIDENTIFIER! Different versions of SQL Server could have a different result. There is just beauty in having a unique id across all databases and systems (robustness)! Move, copy, transform data as you please! https://www.mssqltips.com/sqlservertip/5105/sql-server-performance-comparison-int-versus-guid/

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange