Create Random Number up to 50 Digits and Store in Varchar

https://dba.stackexchange.com/questions/207408

01-01-2021
|

Question

Does anyone have code; looking for a random number function generator. I would just supply with a variable length @NumberLength = 50, etc. It can create numbers up to 50 digits, and store in varchar. (Bigint does not store this high) I am using Floor Rand, Floor, NewId(), still not receiving answer, that exceeds 12 digits.

If anyone has a solution, that would be great. I need to randomize a column in table.

Thank you,

Note: Needs to be in a function, functions cannot accept NewID or rand() for some reason, so I sometimes utilize views. If random number is used again, no issue, we can have little bit repetition.

Solution

Here's one approach (demo).

Create a view as CRYPT_GEN_RANDOM won't be allowable in a function

CREATE VIEW dbo.LongRandom
AS
  SELECT CONVERT(VARCHAR(500), CRYPT_GEN_RANDOM(500), 2) AS random_hex_string

And then call it in a function and Replace all A-F with empty strings. The below uses TRANSLATE to map all B-F to A and a final REPLACE to remove all the As.

You could also simply use 6 nested REPLACE (if you are on a version < 2017 you will have to anyway)

CREATE FUNCTION dbo.reallyBigRandInt(@Length TINYINT)
RETURNS VARCHAR(50)
AS
  BEGIN
      RETURN
        (SELECT LEFT(REPLACE(TRANSLATE(random_hex_string,'BCDEF','AAAAA'),'A',''), @Length)
         FROM   dbo.LongRandom)
  END

The string returned from the view is 10 times longer than the maximum final string required to make it extremely unlikely that removing these characters will leave insufficient characters.

Then select from that

SELECT dbo.reallyBigRandInt(50);

NB: You could also forgo the string parsing and just use CRYPT_GEN_RANDOM to generate three bigint that you glue together (without minus signs) but this is less random unless you are careful.

The maximum bigint is 9223372036854775807 so clearly there is a lower probability that the leftmost digit will be 9. Similarly there is a greater probability that the second character will be 1 or 0 than any other digit. As it is possible for the second character to be 3 only if the leading character is 0-8. Similar issues exist for the other positions but decline in importance as you move rightwards.

If this isn't a problem for you you can concatenate the right most 17 digits of three bigints together to get a 51 character numeric string and then cut that down to the desired length.

CREATE VIEW dbo.RandomBigInt
AS
  SELECT CONVERT(BIGINT, CRYPT_GEN_RANDOM(8)) AS number

And function

CREATE FUNCTION dbo.reallyBigRandInt2(@Length TINYINT)
RETURNS VARCHAR(50)
AS
  BEGIN
      RETURN
        (SELECT RIGHT(RIGHT(FORMAT(number, 'D19'),17) + 
                      RIGHT(FORMAT(number, 'D19'),17) + 
                      RIGHT(FORMAT(number, 'D19'),17), @Length)
FROM  dbo.RandomBigInt)
  END

OTHER TIPS

Using the examples in Generating Random Numbers in SQL Server Without Collisions as a starting place, here is something that may work for you.

--set up demo data
--using a recursive CTE, load a table with 1 million pre-allocated random number string
--this took about 10 seconds on my laptop
drop table if exists #temp
go
Declare @start int, @end int
Select @start=1, @end=1000000

;With NumberSequence( Number, RandNum ) as
(
    Select @start as Number,
    right(
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(@start) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(@start) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(@start) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(@start) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(@start) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(@start) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(@start) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(@start) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(@start ) * 1000000) % 1000000)) 
    ,50) as RandNum

        union all
    Select Number + 1,
    right(
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(Number + 1) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(Number + 2) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(Number + 3) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(Number + 4) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(Number + 5) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(Number + 6) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(Number + 7) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(Number + 8) * 1000000) % 1000000)) + 
    convert(varchar(10),1000000 + (CONVERT(INT, RAND(Number + 9) * 1000000) % 1000000)) 
    ,50) as RandNum

    from NumberSequence
    where Number < @end
)
select number, RandNum into #temp from NumberSequence
option (MaxRecursion 0)


--Declare a table that we intend to populate with the random string data
declare @Table table (id int, RandomColumn varchar(50))
insert into @Table(id, RandomColumn) values
    (1,' '),
    (2,' '),
    (3,' '),
    (4,' ')

--Assign a row number to each row that we can use to join against the random string table
;with TableToUpdate as
(
select *, ROW_NUMBER() over (order by id) as rn from @Table
)
UPDATE t
SET t.RandomColumn = tmp.RandNum
FROM TableToUpdate t
JOIN #temp tmp ON tmp.Number = t.rn

--check the results
SELECT * FROM @Table

| id | RandomColumn                                       |
|----|----------------------------------------------------|
| 1  | 11713591171359117135911713591171359117135911713591 |
| 2  | 91713647171366617136851713703171372217137411713759 |
| 3  | 71713666171368517137031713722171374117137591713778 |
| 4  | 61713685171370317137221713741171375917137781713796 |

If you need random numbers less than 50 bytes, you could try SUBSTRING with the desired length and see how many duplicates that might present and whether those duplicates will be acceptable. I ran a test against 1 million rows and found no duplicates with either 17 characters or 40 characters (per your comments).

select substring(RandNum,1,17) as RandNum, count(*)
from #temp
group by substring(RandNum,1,17) 
having count(*) > 1

select substring(RandNum,1,40) as RandNum, count(*)
from #temp
group by substring(RandNum,1,40) 
having count(*) > 1

The following code creates a view to provide access to the side-effecting function, CRYPT_GEN_RANDOM. The function calls the view numerous times getting a single byte from CYRPT_GEN_RANDOM for each call.

DROP FUNCTION IF EXISTS dbo.gen_ran_tvf;
DROP VIEW IF EXISTS dbo.gen_ran_view;
GO
CREATE VIEW dbo.gen_ran_view
WITH SCHEMABINDING
AS
SELECT cgr = CONVERT(int, CRYPT_GEN_RANDOM(1));
GO
CREATE FUNCTION dbo.gen_ran_tvf
(
    @digits int
    , @randomizer int
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN (
    WITH nums AS (
        SELECT n = CONVERT(int, v.n)
        FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9))v(n)
    )
    , digits AS (
        SELECT TOP(@digits)
            v.cgr
        FROM dbo.gen_ran_view v
            CROSS JOIN nums n1
            CROSS JOIN nums n2
    )
    SELECT 
        t = LEFT(
            (
            SELECT '' + digits.cgr
            FROM digits
            FOR XML PATH('')
            )
            , @digits)
        , r = @randomizer
);
GO

The function itself is a schema-bound table-valued-function that can be "inlined" by the SQL Server query optimizer, and as a result is about as fast as you can get for this type of operation.

I created a test table and populated it with 1,000,000 rows:

DROP TABLE IF EXISTS dbo.SampleData;
GO
CREATE TABLE dbo.SampleData
(
    SampleDataID int NOT NULL
        CONSTRAINT PK_SampleData
        PRIMARY KEY 
        CLUSTERED
        IDENTITY(1,1)
    , SomeVal varchar(50) NOT NULL
);
GO

TRUNCATE TABLE dbo.SampleData;
GO

INSERT INTO dbo.SampleData (SomeVal)
SELECT TOP(1000000) CONVERT(varchar(50), '')
FROM sys.syscolumns c1
    , sys.syscolumns c2;
GO

This is how you might expect to use the TVF to update the SampleData table:

UPDATE dbo.SampleData
SET SomeVal = tvf.t
FROM dbo.SampleData sd
    CROSS APPLY dbo.gen_ran_tvf(50, sd.SampleDataID) tvf

However, with a large number of rows, the resulting execution plan uses a performance table spool to generate a single row from the TVF. This results in every row having the same value, which is not the desired state.

For this particular setup, if you only insert 57 rows into the table, and run the above statement, you do see random values for each row since the table spool is no longer present, and the query optimizer chooses to execute the TVF once for each row in SampleData.

If you run on SQL Server 2016 or newer, you can use a new table hint, NO_PERFORMANCE_SPOOL, to ensure this "optimization" is never used. When you combine that hint with the join-order-enforcing hint, FORCE ORDER, the result is random rows for every row in the table, no matter how many rows are present.

So, the update statement becomes:

UPDATE dbo.SampleData
SET SomeVal = tvf.t
FROM dbo.SampleData sd
    CROSS APPLY dbo.gen_ran_tvf(50, sd.SampleDataID) tvf
OPTION (
    NO_PERFORMANCE_SPOOL
    , FORCE ORDER
    );
GO

The "actual" execution plan:

Looking at the table:

SELECT *
FROM dbo.SampleData;

We see:

╔══════════════╦════════════════════════════════════════════════════╗
║ SampleDataID ║                      SomeVal                       ║
╠══════════════╬════════════════════════════════════════════════════╣
║            1 ║ 13112011187125181161243421302232222710240208203146 ║
║            2 ║ 46314918617612853143741876746110662154749221200135 ║
║            3 ║ 51061817319197421023124056174411660871198014321011 ║
║            4 ║ 42184243211535022623816320713413918322511811717948 ║
║            5 ║ 50931761021417838201791946726229222231676112631621 ║
║            6 ║ 13588442531751073017338155821851591207315016221382 ║
║            7 ║ 12418133401459429211173481131611316869160118221209 ║
║            8 ║ 82271435818112225210622167252113138163226124182352 ║
║            9 ║ 75220220124661172206422425299201988022810670231532 ║
║           10 ║ 21610776198239186174931291616122930332049222921229 ║
║           11 ║ 22811311396795182941996034109261472352503620625436 ║
║           12 ║ 74612472525863716112125157233126171220494114848272 ║
║           13 ║ 25119215323633306520710920720911421423524322717016 ║
║           14 ║ 57250150114123725014912523398921624261693927878515 ║
║           15 ║ 13116129240304813115918225022257130174017136111245 ║
...
║       660731 ║ 23916221915121421617721762187898720350232178132462 ║
║       660732 ║ 17016020013213211911920940141102196558714414847243 ║
║       660733 ║ 16108202204200984770211104216131122159591931201861 ║
║       660734 ║ 15524112614417119114811316419015619112023520711342 ║
╚══════════════╩════════════════════════════════════════════════════╝

On my old, slow workstation, updating 1,000,000 rows with this TVF takes around 40 seconds to complete.

Since we are "gluing" together individual sets of numbers in the range of 0 to 255, there will be a higher number of occurrences of the numbers 1 and 2 than of the other numbers; however since you're using this to obfuscate data instead of encrypting it, I don't think that will be a deal-breaking problem.

Got it. Well.. I've got one way. Might be kinda slow on a big set:

since you can't use rand() in a function, I stole a view from @Pரதீப் in this post:

https://stackoverflow.com/questions/31468836/use-rand-in-user-defined-function#31468878

and modified it a little bit to look like this:

    create VIEW random_val_view
    AS
    SELECT cast(floor(RAND()*10) as char(1)) as  random_value

then I wrote this little ditty, to concatenate the result of the view:

    create function dbo.reallyBigRandInt(@iterations int)
    returns varchar(255)
    as begin
    declare @len int,
    @string varchar(255)

    set @len = 0
    set @string = ''

    while @len <@iterations

    begin

    set @string = @string + (select top 1 * from  random_val_view) 

    set @len +=1

    end

    return @string
    end

Then you can call it like so:

    select dbo.reallyBigRandInt(50)

changing the number you put into the function will change the length of the string that comes out.

Obviously, it won't handle duplicates, as Aaron & Erik mentioned.

Right now, it can do a string up to 255 charachters, but I suppose you could return varchar(max). I'm not brave enough to use varchar(max) on SE, though...

Here is my suggestion.

    ALTER FUNCTION dbo.GetRandomValue (

@input UNIQUEIDENTIFIER

,@ReqLen INT

)

RETURNS VARCHAR(MAX)

AS

BEGIN

DECLARE @RandomValue VARCHAR(max)

SELECT TOP (cast((10 / 50.0) * @ReqLen AS INT)) @RandomValue = COALESCE(@RandomValue + '', '') + cast(abs(CHECKSUM(@input)) AS VARCHAR(mAX))

FROM dbo.TABLE1

RETURN left(@RandomValue, @ReqLen)

END

Here Table1 can be any of your tables. If it does not contain more than 50-60 rows and has fewer columns performance will be better.

That's why I didn't choose a system table.

Usage

select dbo.GetRandomValue (newid(),33)

select dbo.GetRandomValue (newid(),200)

select dbo.GetRandomValue (newid(),5000)

and so on.

Alternate solution

It will return an alphanumeric. But it will work as you want.

declare @RandomValue varchar(max)

declare @RequireLength int=33

select @RandomValue=left(convert(varchar(max), CRYPT_GEN_RANDOM(@RequireLength) ,2),@RequireLength)

select @RandomValue,len(@RandomValue)

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange