Generate fake data for several tables with relationships
-
29-09-2020 - |
Question
I have 3 tables where I want to generate fake data in UsersCategoryLink
.
How can I insert in the table UserCategoryLink
the column UserID
with random users from the table User
and a random id from the table Categories
. In this SQL fiddle you can see the tables with some values.
UsersCategoryLink
must be filled with random users and categories.- Every user must have two categories.
UserID
andCategoryID
are primary keys, so every value must be unique.- I am using SQL Server Express.
La solution
This is my version of the cross-reference table where you'll store users and the categories they are members of. Notice there is a primary key clustered index on (UserID, CategoryID)
; this ensures each row is unique:
CREATE TABLE dbo.XREFUserCategories
(
UserID INT NOT NULL
CONSTRAINT FK_XREFUserCategories_UserID
FOREIGN KEY
REFERENCES dbo.Users(UserID)
, CategoryID INT NOT NULL
CONSTRAINT FK_XREFUserCategories_CategoryID
FOREIGN KEY
REFERENCES dbo.Categories(CategoryID)
, CONSTRAINT PK_XREFUserCategories
PRIMARY KEY CLUSTERED (UserID, CategoryID)
);
GO
Since you already have data in the Users
and Categories
tables, you can use a CTE with the ROW_NUMBER
windowing function partitioned by UserID
and ordered by an essentially random value, NEWID()
. This allows us to pick precisely two "categories" for each "user":
;WITH rs AS
(
SELECT u.UserID
, c.CategoryID
, rn = ROW_NUMBER() OVER (
PARTITION BY u.UserID
ORDER BY NEWID()
)
FROM dbo.Users u
, dbo.Categories c
)
INSERT INTO dbo.XREFUserCategories(UserID, CategoryID)
SELECT rs.UserID
, rs.CategoryID
FROM rs
WHERE rs.rn <= 2;
The following is a sample of rows I generated on my system: