Question

Does anyone have any SQL code to automatically generate randomized birth dates, where the date of birth is less than today? Please add Date of Birth Range parameters, eg: from 18 to 70 years old.

Is there any inline SQL or function to do this?

We are trying to obfuscate our column [Date of Birth].

Thanks,

Was it helpful?

Solution

This will add a random number of days to 1st of January, 1900:

SELECT DATEADD(DAY, CONVERT(int, CRYPT_GEN_RANDOM(2)), '1900-01-01T00:00:00');

According to the Microsoft Docs, CRYPT_GEN_RANDOM "returns a cryptographic random number generated by the Crypto API (CAPI). The output is a hexadecimal number of the specified number of bytes."

So CRYPT_GEN_RANDOM(2) returns a two-byte number in the range of 0x0000 to 0xFFFF, when converted into a signed-integer and "added" to 1900-01-01, will result in dates in the range of 1900-01-01 to 2079-06-06.

For a table named dbo.MyTable, with a column named [Date of Birth], this will update all column values to randomly generated dates:

UPDATE dbo.MyTable
SET [Date of Birth] = DATEADD(DAY, CONVERT(int, CRYPT_GEN_RANDOM(2)), '1900-01-01T00:00:00');

One could reverse the logic such that you have people of various ages from 0 days old to approximately 59 years old with this:

UPDATE dbo.MyTable
SET [Date of Birth] = DATEADD(DAY, (1 - CONVERT(int, CRYPT_GEN_RANDOM(2)) / 3), GETDATE());

The following example will randomly choose birth dates resulting in ages between 10 and 20 years old:

DECLARE @MinAge int;
DECLARE @MaxAge int;

SET @MinAge = 10;
SET @MaxAge = 20;
UPDATE dbo.MyTable
SET [Date of Birth] = DATEADD(DAY
    , (1 - (CONVERT(int, CRYPT_GEN_RANDOM(2)) % ((@MaxAge - @MinAge) * 365)))
    , CONVERT(date, DATEADD(YEAR, 1 - @MinAge, GETDATE()))
    );

OTHER TIPS

Method 1

select DATEADD(DAY, -(ABS(CHECKSUM(NEWID()) % 36500 )), getdate());

Sample outputs:

1980-11-10 02:19:37.643
1940-08-25 02:20:06.217
1967-10-10 02:20:15.030
2013-03-20 02:20:24.933
1951-11-19 02:20:38.973

To summarize, the following code generates a random number between 0 and 36500. (36500 days roughly equals to 100 years; you can use 36525 to make it exactly 100 years.)

ABS(CHECKSUM(NEWID()) % 36500 )

By reducing the present day by that randomly generated number (random number of days), you will be able to get a random date for a person between the ages of 0 and 100.

Demo: http://sqlfiddle.com/#!18/9eecb/15528/0

  

Method 2

DECLARE @start DATE = '1980-01-01'
DECLARE @end DATE = '1980-01-05'

SELECT DATEADD(DAY,ABS(CHECKSUM(NEWID())) % DATEDIFF(DAY,@start,@end) ,@start)

Sample outputs:

1980-01-01
1980-01-04
1980-01-03
1980-01-02

Using the DATEDIFF function, you get get the difference between two dates. In this case (DATEDIFF(DAY,@start,@end), the difference between the start date and end date will be obtained in days. By adding this value to start date, you can generate random dates between the start date and end date.

However, this will not return the end date (1980-01-05) as a randomly generated date. To get that, you can add 1 to the difference.

SELECT DATEADD(DAY,ABS(CHECKSUM(NEWID())) % ( 1 + DATEDIFF(DAY,@start,@end)),@start)

Demo: http://sqlfiddle.com/#!18/9eecb/15542/0

  

Method 3

Example 1

SELECT DATEADD(DAY, RAND() * ((-36500) - 1), GETDATE())

Sample Output:

1956-02-25T23:44:17.62Z
2006-09-08T23:44:40.62Z
1987-06-13T23:44:53.717Z

Example 2

SELECT DATEADD(DAY, RAND() * ((-1) - 1), GETDATE())

Sample Output:

2018-05-03 06:32:56.753
2018-05-02 06:32:56.753

Note: If you remove the '- 1', 2018-05-02 06:32:56.753 will not be generated.

Demo: http://sqlfiddle.com/#!18/9eecb/15554/0

  

Method 4

DECLARE @start DATE = '1980-01-01'
DECLARE @end DATE = '1980-01-05'

SELECT DATEADD(DAY, RAND() * DATEDIFF(DAY,@start,@end) ,@start)

Sample outputs:

1980-01-02
1980-01-01
1980-01-03
1980-01-04

Note: This will not also return the end date (1980-01-05) as a randomly generated date. You get that you have to add 1 like this.

DATEDIFF(DAY,@start,@end) +1

Demo: http://sqlfiddle.com/#!18/9eecb/15538/0

  

Method 5

DECLARE @from INT = 18 
DECLARE @to INT = 70 

DECLARE @tfrom DATE = DATEADD(YEAR, -(@from), GETDATE()) 
DECLARE @tto DATE = DATEADD(YEAR, -(@to), GETDATE()) 

DECLARE @diff INT = DATEDIFF(DAY, @tfrom, @tto)

SELECT DATEADD(DAY, RAND() * (-(@diff) - 1), @tto)

Sample outputs:

1967-11-03
1955-10-09
1967-06-03
1962-11-17
1970-07-04

Demo: http://sqlfiddle.com/#!18/9eecb/15555/0

A solution that will keep the existing distribution of birth dates is the following: create a new birth date by concatenating year, month and day from three other different existing birth days in the database. Generate three different random numbers i, j, k (that are less than the total number of records), pick year from row i, month from row j, day from row k, and concatenate them into a date. It is even better to crate a second column, populate it while iterating the initial birthday column, and later delete the initial birthday column. Otherwise, if we populate the birthday column while iterating it, we risk to pick data that was already altered using this strategy, and we could end up having the same year repeated all over the place.

This approach is less good at data-masking, because if you have a user born in 1902, this year will appear (although with different month and day), possibly leading to a unique identification of user. However, as far we are concerned about data distribution, this solution keeps the distribution for years as well as months and dates: it gives any year in the database an equal chance to be picked, so if we have twice as many people born in 1990 than in 1970, the proportion will stay quite the same within the set of generated birthdays.

A second approach: for row N, pick year from row N+1, month from row N+2, day from row N+3. This approach is even worse at data-masking, but even better at keeping distribution, because all we do is permute data, thus keeping the exact years, months, days, only rearranged on different rows.

A third approach:

new_month = (current_month + next_record's_month) % 12
new_day = (current_day + next_record's_day) % 30 or %31 - nr of days of that month
new_year = (current_year + next_record's_year) / 2

This approach is the best so far at obfuscation: if we have only two users, one born in January, the other born in April, a resulting user will be born in May (month 01 + 04 = 05). Addition modulo 12 is somehow like the permutation, in terms of keeping the distribution of data. As for the years, computing an average will make the distribution curve a bit more crowded towards the center - if we have only one user born in 1900, a resulting birthday will be an average of 1900 with another year, but still, an early year.

The general pattern is: make use of the existing data, instead of generating completely random values.

I did not provide any code because i am not familiar with sql-sever syntax, but i thought this idea is worth mentioning.

You can also do it using a date table and ordering it by newid().

I've used this technique to scramble lots and lots of data in the past. One advantage is that you can scramble any field by joining the table to itself on rank() over (order by newid())

Note: if your person table is bigger than your date table, in this example, loop the date table insert a few times until it is bigger.

    --get your data into a safe space so you don't blow out the wrong table while you work
    drop table if exists person_space
    select top 5000 * into person_space from AbstractData

    select * from person_space

    --create your date table
    drop table if exists datetable
    create table datetable (day date)

    declare @date date
    set @date = '1950-01-01'

    while @date < '2025-01-01'
    begin
    insert into datetable(day)
    select @date
    set @date = dateadd(day,1, @date)
    end

    --select * from datetable

    --scramble the shit out of your tables using rank by newid()
    ;with ScramDates as (select rank() over (order by newid()) as randomRank, day from datetable
    where day <= getdate())
    ,peeps as (select rank() over (order by newid()) as randomRank, BirthDateTime, AccountNumber from person_space)

    ,finalcountdown as (select p.AccountNumber, p.BirthDateTime as old_dob, s.day as new_dob from ScramDates s
    inner join peeps p on s.randomRank = p.randomRank)

    select * from finalcountdown

    --update the date_of_birth

    --update p
    --set p.BirthDateTime = new_dob
    --from finalcountdown f
    --inner join person_space p on p.AccountNumber = f.AccountNumber

    --select * from person_space

Here's how you can join a table to itself randomly and update a column. This method will also maintain the distribution of your data:

    ;with ScramDates as (select rank() over (order by newid()) as randomRank, day from datetable
    where day <= getdate())
    ,peeps1 as (select rank() over (order by newid()) as randomRank, * from person_space)
    ,peeps2 as (select rank() over (order by newid()) as randomRank, * from person_space)

    --select p1.BirthDateTime, p2.BirthDateTime, p1.Name, p2.Name, * from peeps1 p1
    --inner join peeps2 p2 on p1.randomRank = p2.randomRank

    update p1
    set p1.BirthDateTime = p2.BirthDateTime
    from peeps1 p1
    inner join peeps2 p2 on p1.randomRank = p2.randomRank

    select * from person_space
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top