Question

We are still on sql server 2005.

We are trying to create a unique index on our patient table. The unique index will be a multi-column index (Status, PatientNumber, First,Last,DOB,Gender). However data already contains multiple duplicates.

We can have 2 records (Active, 0001, John, Doe, 1/1/1960,M) and (Active, 0001, John, Doe, 1/1/1960,M) and they are actually dups so 1 must be inactivated. We can also have 2 records (Active,0001, John, Doe, 1/1/1960,M) and (Active,0001, John, Doe, 1/1/1960,M) and they are NOT dups so the practice must find a new PatientNumber for one of the 2 records to indicate 2 distinct patients that just happen to have same name, DOB and gender.

Since there are dups, the user has been inactivating dups to keep one living record. So We can have 3 records (Inactive, 0001, John, Doe, 1/1/1960,M) and ( Inactive,0001, John, Doe, 1/1/1960,M) and ( Active,0001, John, Doe, 1/1/1960,M). One Inactive dup has to be removed for the unique index creation.

The business does not embrace fixing existing duplicate data.

We don't want to use a function to enforce patient uniqueness to check for active rows only.

My plan was to clean data as per this consideration: A unique index, UNIQUE constraint, or PRIMARY KEY constraint cannot be created if duplicate key values exist in the data.

BUT our dba said there is an option when creating a unique index that will let you create that index and not complain about existing dups??? Needless to say the business is overjoyed hearing this option.

For example: I plan to create the unique index on 4/1/2014. I can use some "option" when creating that index to tell sql server to not bother with duplicates that exist before 4/1/2014? After the index is created (ie after 4/1/2014) all dups will be violations.

I am having difficulty finding that option. Can anyone advise or comment?

Thanks!

Was it helpful?

Solution

The SQL Server 2005 manual actually says;

A unique index or constraint cannot be created if there are existing duplicate values in the key columns.

That said, you can work around it. The below is one way, if it's acceptable in your case is up to you; the example below has a reservation that I only have SQL Server 2008 to test on :)

What you can do is create a _dedupe column and include that in the index. For existing duplicates you set unique values in the columns, leaving one row with a NULL value. When inserting the further values, don't set the _dedupe column, and you'll fail inserts which are duplicates.

As an example;

> CREATE TABLE test ( id INT, value INT );
> INSERT INTO test (id, value) VALUES (1,1),(2,1),(3,3);

id   value
----------
1    1
2    1
3    3 

> ALTER TABLE test ADD _dedupe INT;

-- Update, partition by the value combination that is not unique now but 
-- should be later, in this case "value".

> WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY value ORDER BY id)-1 rn FROM test
  )
  UPDATE cte SET _dedupe = CASE WHEN rn=0 THEN NULL ELSE rn END;

id   value   _dedupe
--------------------
1    1       NULL
2    1       1
3    3       NULL

> CREATE UNIQUE INDEX uq_value ON test(value, _dedupe);

> INSERT INTO test (id, value) VALUES (4,1)    <-- fail, not unique

An SQLfiddle to test with.

One downside with this approach is that the only row that prevents a duplicate insert is the NULL row, if you delete that you may end up with a new duplicate with an existing numbered row. That may or may not be a problem for your system.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top