t-sql string unique ID (Northwind database)

Question 1

I believe you can do something like this to make sure you all get a unique id

begin

declare @chars char(26) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
declare @i int = 0
declare @id varchar(max) = ''


while (1=1)
begin
        set @id = @id + substring(@chars, cast(ceiling(rand() * 26) as int), 1)

    set @i = @i + 1

   IF (NOT EXISTS(SELECT * FROM Customers WHERE CustomerID = @id) AND LEN(@id) = 5)
      BREAK
   ELSE
      CONTINUE
end

Select (cast(@id as nvarchar(400)))

end

Set the while condition to be always true and break out of while loop only when both of your requirements are TRUE i.e Length of new ID is 5 and it does not exist in the customers table already.

Question 2

The main problem here is that the incremental cost of detecting collisions from the generated string, and try again, increases as you generate more and more strings (since you have to read all of those strings to make sure you didn't generate a duplicate). At the same time, the odds of hitting a duplicate goes up, meaning the bigger the table gets, the slower this process will get.

Why do you need to generate the unique string at runtime? Build them all in advance. This article and this post are about random numbers, but the basic concept is the same. You build up a set of unique strings and pull one off the stack when you need one. Your chance of collisions stays constant at 0% throughout the lifetime of the application (provided you build up a stack of enough unique values). Pay for the cost of collisions up front, in your own setup, instead of incrementally over time (and at the cost of a user waiting for those attempts to finally yield a unique number).

This will generate 100,000 unique 5-character strings, at the low, one-time cost of about 1 second (on my machine):

;WITH 
 a(a) AS 
 (
   SELECT TOP (26) number + 65 FROM master..spt_values 
   WHERE type = N'P' ORDER BY number
 ),
 b(a) AS 
 (
   SELECT TOP (10) a FROM a ORDER BY NEWID()
 )
SELECT DISTINCT CHAR(b.a) + CHAR(c.a) + CHAR(d.a) + CHAR(e.a) + CHAR(f.a)
FROM b, b AS c, b AS d, b AS e, b AS f;

That's not enough? You can generate about 1.12 million unique values by changing TOP (10) to TOP (20). This took 18 seconds. Still not enough? TOP (24) will give you just under 8 million in about 2 minutes. It will get exponentially more expensive as you generate more strings, because that DISTINCT has to do the same duplicate checking you want to do every single time you add a customer.

So, create a table:

CREATE TABLE dbo.StringStack
(
  ID INT IDENTITY(1,1) PRIMARY KEY,
  String CHAR(5) NOT NULL UNIQUE
);

Insert that set:

;WITH 
 a(a) AS 
 (
   SELECT TOP (26) number + 65 FROM master..spt_values 
   WHERE type = N'P' ORDER BY number
 ),
 b(a) AS 
 (
   SELECT TOP (10) a FROM a ORDER BY NEWID()
 )
INSERT dbo.StringStack(String)
SELECT DISTINCT CHAR(b.a) + CHAR(c.a) + CHAR(d.a) + CHAR(e.a) + CHAR(f.a)
FROM b, b AS c, b AS d, b AS e, b AS f;

And then just create a procedure that pops one off the stack when you need it:

CREATE PROCEDURE dbo.AddCustomer
  @CustomerName VARCHAR(64) /* , other params */
AS
BEGIN
  SET NOCOUNT ON;
  
  DELETE TOP (1) dbo.StringStack
    OUTPUT deleted.String, @CustomerName /* , other params */
    INTO dbo.Customers(CustomerID, CustomerName /*, ...other columns... */);
END
GO

No silly looping, no needing to check if the CustomerID you generated just exists, etc. The only additional thing you'll want to build is some type of check that notifies you when you're getting low.

As an aside, these are terrible identifiers for a CustomerID. What is wrong with a sequential surrogate key, like an IDENTITY column? How is a 5-digit random string with all this effort involved, any better than a unique number the system can generate for you much more easily?

Question 3

Muhammed Ali 's answer works, but will prove rather ressource intensive (especially when there aren't many combinations of 5 letters left to use) : your function uses the random generator, and it will take it a while to find a combination that isn't used, especially since it has a very limited memory of its previous results. This means it will try, and might give you something of the sort (exaggerating a bit) : BAGER the first time, then ANSWE the second time, then again BAGER the third time. You see you will lose a good amount of time with the generator giving you the same answer over and over again (especially over 12M possible combinations).

If you are looking for a fixed length ID (since you use NCHAR(5), I guess that's a good assumption), I would rather look into building a table that contains all the possible combinations, and pick one value of this table every time you need one. You would delete it once it got used, or mark it as used (which I would prefer, for reuseability reasons).

This leads to my final comment (which I cannot put as comment 'cause I don't have enough reputation) : why not use the IDENTITY function provided by MS-SQL ? This provides a much better handling of the Primary key generation...