Creating a Persisted Computed Column with a function

https://dba.stackexchange.com/questions/204545

30-12-2020
|

Frage

I am working with the programmers on a database solution. They want to add a computed column to mimic the old keys for the older queries, procedures, and systems and index it. The new keys will be GUIDS.

To do this, they want to create a function for the computed column that creates a value and persist it. It will not let them persist the column. I don't have any warm fuzzies about the idea and I also can not find any info on the web about the technique (is it a technique?).

I am thinking they need to add a trigger instead. Does anyone have any ideas?

The function will be run as this:

(SELECT [INT Identity field] FROM TABLE WHERE [GUID COLUMN] = @GUIDKEY

It returns an INT Identity field based on the GUID.

This will be run on ever insert into a related table. SO if Table One holds the primary key, the related table Two will update (using the GUID passed in) to get the key from the Table one and insert it into table two.

Lösung

Still don't understand why this needs to be a column in a table, never mind a persisted one.

Why not just create a table-valued function that you cross apply when (and only when) the query actually needs it? Since the old key will never change it doesn't need to be computed or persisted anyway.

If you really want the old key to live in multiple places (and it sounds like people who shouldn't be making this kind of decision have already made this kind of decision), just do the lookup in a trigger and populate it at write time. Then it's just a static column in a table.

I still highly recommend a table-valued function to facilitate this, so that you can write the trigger in such a way that it handles multi-row operations... without having to write a loop, or call a scalar-valued function over and over again for every row.

Just to show how similar these things really are (and question your lead developer who "doesn't like it"):

-- bad, slow, painful row-by-row
CREATE FUNCTION dbo.GetIDByGUID
(
  @GuidKey uniqueidentifier
)
RETURNS int
AS
BEGIN
  RETURN (SELECT $IDENTITY FROM dbo.tablename WHERE guid_column = @GuidKey);
END
GO

-- in the trigger:
UPDATE r SET oldkey = dbo.GetIDByGUID(i.guid_column)
  FROM dbo.related AS r
  INNER JOIN inserted AS i
  ON r.guid_column = i.guid_column;

Now, if you have a table-valued function, the code is quite similar, but you'll find the performance is much better in multi-row operations, and close to identical for single-row operations.

-- ah, much better
ALTER FUNCTION dbo.GetIDByGUID_TVF
(
  @GuidKey uniqueidentifier
)
RETURNS TABLE
AS
  RETURN (SELECT id = $IDENTITY FROM dbo.tablename WHERE guid_column = @GuidKey);
GO

-- in the trigger:
UPDATE r SET oldkey = f.id
  FROM dbo.related AS r
  INNER JOIN inserted AS i
  ON r.guid_column = i.guid_column
  CROSS APPLY dbo.GetIDByGUID_TVF(i.guid_column) AS f;

Andere Tipps

I'm not sure why you think you need a function or a computed column to do this. You can just add a new column to the table with a default value and index it however you want.

CREATE TABLE dbo.whatever ( Id INT );

ALTER TABLE dbo.whatever
ADD YourMom UNIQUEIDENTIFIER
        DEFAULT NEWSEQUENTIALID();

CREATE INDEX ix_whatever ON dbo.whatever (YourMom);

Since you updated your question, let's address what a truly awful idea this is. I'm going to simplify the example a little bit.

CREATE TABLE dbo.whatever ( Id INT PRIMARY KEY);

CREATE TABLE dbo.ennui ( Id INT PRIMARY KEY, meh INT );
GO 

CREATE FUNCTION dbo.BadIdea ( @notguido INT )
RETURNS INT
WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT
AS
    BEGIN
        DECLARE @out INT;
        SELECT @out = ( SELECT e.Id FROM dbo.ennui AS e WHERE e.meh = @notguido );
        RETURN @out;
    END;
GO 

ALTER TABLE dbo.whatever ADD ishygddt AS dbo.BadIdea(Id)

/*Will fail*/
ALTER TABLE dbo.whatever ALTER COLUMN ishygddt ADD PERSISTED;

/*Will fail*/
CREATE INDEX ix_whatever ON dbo.whatever (ishygddt);

Trying to persist a computed column based on a scalar function (even a deterministic one with SCHEMABINDING) will fail if they perform data access.

Msg 4934, Level 16, State 3, Line 23 Computed column 'ishygddt' in table 'whatever' cannot be persisted because the column does user or system data access.

Nor can you index it:

Msg 2709, Level 16, State 1, Line 25 Column 'ishygddt' in table 'dbo.whatever' cannot be used in an index or statistics or as a partition key because it does user or system data access.

You'll also run into a lot of problems because the function will run row-by-row to retrieve data, and will force all queries against the table to run serially.

If you're modifying data in the table that the function references, and selecting data from the table that calls the function in the computed column, you can end up with some really confusing blocking scenarios where the function is blocked from returning data to a query on a seemingly unrelated table.

This is a bad idea all around. Aaron gave what I think is the best advice in his comment:

Why not just create a table-valued function that you cross apply when (and only when) the query actually needs it? Since the old key will never change it doesn't need to be computed or persisted anyway. If you really want the old key to live in multiple places just do the lookup in a trigger.

You can read about Persisted Computed Columns in BOL, and the related Indexes on Computed Columns.

There are restrictions on the expression you can use in a persisted computed column. The Expression "must be deterministic when PERSISTED is specified."

If I understand correctly, you have:

A table, let's call it t.
Table t has a guid column, let's call it gc.
An additional table is a lookup mapping the values in t.gc to a different key value of a different type, that is used in legacy code. Let's call the table lt and the legacy key column lk.
You wish lt.lk to show up in t as t.lk so that legacy code that can continue using it.

I would investigate using a view.

Rename t, to something like t_base.
Create a view named t that joins t_base to lt and returns the column of t_base and lt.lk.
If it needs to be persisted, investigate indexed views. (Requires enterprise edition and has many restrictions, but probably appropriate for this join.)

The actual problem

The problem the developers are trying to solve is a fairly standard migration problem, from one data type to a new one. They have a new problem that was not anticipated at the time the database was originally designed; namely, they now need to synchronize data across databases. This tends to be a costly endeavor, particularly if the data types are relied upon in a lot of code. Looking for cost saving measures is not wholly unreasonable.

Their solution

Math says no

First of all, it's important to realize that the problem is probably mathematically intractable. A GUID or UUID is just a 128 bit, or 16 byte, number. The size of an IDENTITY column depends on the data type used, but they are typically INT (32 bits, 4 bytes); sometimes BIGINT (64 bites, 8 bytes) is used. There is no mathematical way to fully map the range of GUIDs to the range of INTs or even BIGINTs. It's mathematically possible if the column is DECIMAL(38,0), which is a big enough, but this is highly uncommon.

Practicality

Even if it's possible to map GUIDs to the DECIMAL type, that doesn't mean it's practical. Virtually no one does this, so you're going to have to spend time (=money) ensuring that the mapping works correctly. Their solution introduces a not insignificant risk of creating strange and hard to diagnose bugs.

Additionally, you're not going to preserve the existing IDs of the data with their solution. This may well break any bookmarks end users have that include the IDs.

Finally, their solution is likely to be entrenched. It's not a good practice because of all of the above reasons, but if they get it, chances are they won't work on moving away from it any time soon because it's too "easy" to just keep using the integer keys in all new code.

A more standard approach

A relatively standard approach to introducing synchronization to an existing system is to add a new unique ID. This new ID is placed in the data in addition to the old, existing keys. Then surrogate keys are not synchronized.

This has some major benefits:

Solves the problem of enabling synchronization across databases.
Existing code that relies on the old keys doesn't have to change. (Definitely not in the short term, and possibly not ever.)
No mathematically impossible mappings.
Existing key values are preserved.

There's two minor annoyances with this approach:

Since surrogate keys are not synchronized, if surrogate keys appear in code and particularly in the application, then these IDs will be different across different databases. For your specific situation (synchronizing to test and development copies of the database, rather than some kind of replication across multiple production databases), this is only a minor annoyance. However, it's also one that can be solved: as the need becomes more apparent through whatever inefficiencies this creates, the developers can adjust specific, targeted pieces of code to use the new synchronization ID as needed, without rewriting the entire application. They could even potentially support both IDs for a time. (For example, a web endpoint might accept an integer key and then redirect to the GUID key to ensure bookmarks aren't broken.) This also becomes the motive for slowly migrating away from using the integer keys in code.
The synchronization code might have to map integer surrogate IDs when synchronizing data with a foreign key. This is far from an intractable problem, though. You just look up the related synchronization ID and use it to find the destination database's integer surrogate key. However, it sounds like your development team is already prepared to switch foreign keys from the existing surrogate keys to the new GUID keys, anyway, so this may not be an issue at all.

Both of these problems are manageable, though, and they're reasonable trade offs to make if switching everything to GUIDs right now is too expensive.

It's also worth noting that this solution enabled exactly the query they're asking to be able to do.

_{This may be too late to help you now, but I think it's good info going forward.}

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit dba.stackexchange