Optional Unique Constraints?

https://stackoverflow.com/questions/4686071

11-10-2019
|

Pergunta

I am setting up a SaaS application that multiple clients will use to enter data. However, I have certain fields that Client A may want to force unique, Client B may want to allow dupes. Obviously if I am going to allow any one client to have dupes, the table may not have a unique constraint on it. The downside is that If I want to enforce a unique constraint for some clients, I will have to go about it in some other way.

Has anyone tackled a problem like this, and if so, what are the common solutions and or potential pitfalls to look out for?

I am thinking a trigger that checks for any possible unique flags may be the only way to enforce this correctly. If I rely on the business layer, there is no guarentee that the app will do a unique check before every insert.

SOLUTION:
First I considered the Unique Index, but ruled it out as they can not do any sort of joins or lookups, only express values. And I didn't want to modify the index every time a client was added or a client's uniqueness preference changed.

Then I looked into CHECK CONSTRAINTS, and after some fooling around, built one function to return true for both hypothetical columns that a client would be able to select as unique or not.

Here is the test tables, data, and function I used to verify that a check constraint could do all that I wanted.

-- Clients Table
CREATE TABLE [dbo].[Clients](
    [ID] [int]  NOT NULL,
    [Name] [varchar](50) NOT NULL,
    [UniqueSSN] [bit] NOT NULL,
    [UniqueVIN] [bit] NOT NULL
) ON [PRIMARY]

-- Test Client Data
INSERT INTO Clients(ID, Name, UniqueSSN, UniqueVIN) VALUES(1,'A Corp',0,0)
INSERT INTO Clients(ID, Name, UniqueSSN, UniqueVIN) VALUES(2,'B Corp',1,0)
INSERT INTO Clients(ID, Name, UniqueSSN, UniqueVIN) VALUES(3,'C Corp',0,1)
INSERT INTO Clients(ID, Name, UniqueSSN, UniqueVIN) VALUES(4,'D Corp',1,1)

-- Cases Table
CREATE TABLE [dbo].[Cases](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [ClientID] [int] NOT NULL,
    [ClaimantName] [varchar](50) NOT NULL,
    [SSN] [varchar](12) NULL,
    [VIN] [varchar](17) NULL
) ON [PRIMARY]

-- Check Uniques Function
CREATE FUNCTION CheckUniques(@ClientID int)
RETURNS int -- 0: Ok to insert, 1: Cannot insert
AS
BEGIN
    DECLARE @SSNCheck int
    DECLARE @VinCheck int
    SELECT @SSNCheck = 0
    SELECT @VinCheck = 0
    IF (SELECT UniqueSSN FROM Clients WHERE ID = @ClientID) = 1
    BEGIN
        SELECT @SSNCheck = COUNT(SSN) FROM Cases cs WHERE ClientID = @ClientID AND (SELECT COUNT(SSN) FROM Cases c2 WHERE c2.SSN = cs.SSN) > 1
    END
    IF (SELECT UniqueVIN FROM Clients WHERE ID = @ClientID) = 1
    BEGIN
        SELECT @VinCheck = COUNT(VIN) FROM Cases cs WHERE ClientID = @ClientID AND (SELECT COUNT(VIN) FROM Cases c2 WHERE c2.VIN = cs.VIN) > 1
    END
    RETURN @SSNCheck + @VinCheck
END

-- Add Check Constraint to table
ALTER TABLE Cases
ADD Constraint chkClientUniques CHECK(dbo.CheckUniques(ClientID) = 0)

-- Now confirm constraint using test data

-- Client A: Confirm that both duplicate SSN and VIN's are allowed
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(1, 'Alice', '111-11-1111', 'A-1234')
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(1, 'Bob', '111-11-1111', 'A-1234')

-- Client B: Confirm that Unique SSN is enforced, but duplicate VIN allowed
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(2, 'Charlie', '222-22-2222', 'B-2345') -- Should work
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(2, 'Donna', '222-22-2222', 'B-2345') -- Should fail
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(2, 'Evan', '333-33-3333', 'B-2345') -- Should Work

-- Client C: Confirm that Unique VIN is enforced, but duplicate SSN allowed
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(3, 'Evan', '444-44-4444', 'C-3456') -- Should work
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(3, 'Fred', '444-44-4444', 'C-3456') -- Should fail
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(3, 'Ginny', '444-44-4444', 'C-4567') -- Should work

-- Client D: Confirm that both Unique SSN and VIN are enforced
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(4, 'Henry', '555-55-5555', 'D-1234') -- Should work
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(4, 'Isaac', '666-66-6666', 'D-1234') -- Should fail
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(4, 'James', '555-55-5555', 'D-2345') -- Should fail
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(4, 'Kevin', '555-55-5555', 'D-1234') -- Should fail
INSERT INTO Cases (ClientID, ClaimantName, SSN, VIN) VALUES(4, 'Lisa', '777-77-7777', 'D-3456') -- Should work

EDIT:
Had to modify the function a few times to catch NULL values in the dupe check, but all appears to be working now.

Solução

One approach is to use a CHECK constraint instead of a unique. This CHECK constraint will be backed by a SCALAR function that will

take as input ClientID
cross-ref ClientID against a lookup table to see if duplicates are allowed (client.dups)
if not allowed, check for duplicates in the table

Something like

ALTER TABLE TBL ADD CONSTRAINT CK_TBL_UNIQ CHECK(dbo.IsThisOK(clientID)=1)

Outras dicas

If you can identify rows in the table for each client, depending on your DBMS you could do something like this:

CREATE UNIQUE INDEX uq_some_col 
   ON the_table(some_column, other_column, client_id)
   WHERE client_id IN (1,2,3);

(The above is valid for PostgreSQL and and I think SQL Server 2005)

The downsize is, that you will need to re-create that index each time a new client is added that requires those columns to be unique.

You will probably have some checks in the business layer as well, mostly to be able to show proper error messages.

That's perfectly possible now on Sql Server 2008(tested on my Sql Server 2008 box here):

create table a
(
cx varchar(50)
);

create unique index ux_cx on a(cx) where cx <> 'US';

insert into a values('PHILIPPINES'),('JAPAN'),('US'),('CANADA');


-- still ok to insert duplicate US
insert into a values('US');

-- will fail here
insert into a values('JAPAN');

There are a few things you can do with this, just depends on when/how you want to handle this.

You could use a CheckConstrain and modify this to do different lookups based on the client that was using it
You could use the business tier to handle this, but it will not protect you from raw database updates.

I personally have found that #1 can get too hard to maintain, especially if you get a high number of clients. I've found that doing it at the business level is a lot easier, and you can control it at a centralized location.

Thee are other options such as a table per client and others that could work as well, but these two are at least the most common that I've seen.

You could add a helper column. The column would be equal the the primary key for the application that allows duplicates, and a constant value for the other application. Then you create a unique constraint on UniqueHelper, Col1.

For the non-dupe client, it will have a constant in the helper column, forcing the column to be unique.

For the dupe column, the helper column is equal to the primary key, so the unique constraint is satisfied by that column alone. That application can add any number of dupes.

One possibility might be to use BEFORE INSERT and and BEFORE UPDATE triggers that can selectively enforce uniqueness.

And another possibility (kind of a kludge) would be to have an additional dummy field that is populated with unique values for one customer and duplicate values for the other customer. Then build a unique index on the combination of the dummy field and the visible field.

@Neil. I had asked in my comment above what your reasons were for putting all in the same table and you simply ignored that aspect of my comment and said everything was "plain and simple". Do you really want to hear the downsides of the conditional constraints approach in an Saas context?

You don't say how many different sets of rules this table in your Saas application may eventually need to incorporate. Will there be only two variations?

Performance is a consideration. Although each customer would have access to a dedicated conditional index|indices, fetching data from the base table could become slower and slower as the data from additional customers is added to it and the table grows.

If I were developing an Saas application, I'd go with dedicated transaction tables wherever appropriate. Customers could share standard tables like zipcodes, counties, and share even domain-specific tables like Products or Categories or WidgetTypes or whatever. I'd probably build dynamic SQL statements in stored procedures in which the correct table for the current customer was chosen and placed in the statement being constructed, e.g.

        sql = "select * from " + DYNAMIC_ORDERS_TABLE + " where ... ")

If performance was taking a hit because the dynamic statements had to be compiled all the time, I might consider writing a dedicated stored procedure generator: sp_ORDERS_SELECT_25_v1.0 {where "25" is the id assigned to a particular user of the Saas app and there's a version suffix}.

You're going to have to use some dynamic SQL because the customer id must be appended to the WHERE-clause of every one of your ad hoc queries in order to take advantage of your conditional indexes:

        sql = " select * from orders  where ... and customerid = " + CURRENT_CUSTOMERID

Your conditional indexes involve your customer/user column and so that column must be made part of every query in order to ensure that only that customer's subset of rows are selected out of the table.

So, when all is said and done, you're really saving yourself the effort required to create a dedicated table and avoiding some dynamic SQL on your bread-and-butter queries. Writing dynamic SQL for bread-and-butter queries doesn't take much effort, and it's certainly less messy than having to manage multiple customer-specific indexes on the same shared table; and if you're writing dynamic SQL you could just as easily substitute the dedicated table name as append the customerid=25 clause to every query. The performance loss of dynamic SQL would be more than offset by the performance gain of dedicated tables.

P.S. Let's say your app has been running for a year or so and you have multiple customers and your table has grown large. You want to add another customer and their new set of customer-specific indexes to the large production table. Can you slipstream these new indexes and constraints during normal business hours or will you have to schedule the creation of these indexes for a time when usage is relatively light?

You don't make clear what benefit there is in having the data from separate universes mingled in the same table.

Uniqueness constraints are part of the entity definition and each entity needs its own table. I'd create separate tables.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow