Question

I'm not sure how to better phrase this question so it's possible I missed a previously asked question. Feel free to close this and point me to the correct one if it exists.

I have a table with two important columns (that is to say it has much more but only two is pertinent to this question). First column is a GUID (an id) and the second is a nvarchar (storing an URL). The combination of the ID and the URL has to be unique (so a same guid can be repeated but each row has a different URL and vice-versa but there cannot be more than one row of the same guid and URL).

Currently, before every INSERT, I do a SELECT to see if there exists a row with the same id and URL. However it looks like lookups on the nvarchar is slow. Therefore I think I will update the table to store an extra column which is filled in with the hash (SHA1) of the URL upon insertion. Now we only do a lookup on the smaller hash (varbinary?) which I assume will be significantly faster than before.

Is there a way to get SQL Server 2008 to automatically store the hash and do a lookup against that hash value instead of the actual text? I'm assuming that the indecies are b-trees, so what I'm asking for is for SQL Server to create the b-tree with the hash values of the text in the nvarchar field and when a select is run, it should calculate the hash and do a lookup in the tree with the hash value. Is this possible?

Was it helpful?

Solution

If you do lookups on your (id, url) fields - do you have an index on those two columns?? If not - add one and see if that speeds up your lookups enough.

If not: yes, you can definitely get this functionality automagically - the magic word is: computed column.

In SQL Server, you can have columns that compute their values automatically, based on a formula you provide. This can be either just a simple arithmetic formula, or you can call a stored function to compute the value.

In order to make this fast for your checks, you would have to make sure you can make that computed column persisted - then you can index it, too. This excludes larger scale computations - the formula has to be clear, concise, and deterministic.

So, do this:

ALTER TABLE dbo.YourTable
  ADD HashValue AS CAST(HASHBYTES('SHA1', CAST(ID AS VARCHAR(36)) + Url) AS VARBINARY(20)) PERSISTED

Now your table has a new HashValue column (call it whatever you like), and you can select that value and inspect it.

Next put an index on that new column

CREATE NONCLUSTERED INDEX IX_Hash_YourTable
  ON dbo.YourTable(HashValue)

Now your lookup should be flying!

OTHER TIPS

could you just put a unique constraint on the table for those two columns and perform the insert inside of a try / catch block?

It would save you from the extra work of calculating the hash, and the extra space of storing it

You can have a trigger that calculates the hash on insert and update and puts it in if required.

In terms of stopping the insert just add a unique index on them

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top