Designing a SQL schema for a combination of many-to-many relationship (variations of products)

Question 1

Applying normalization to your problem, the solution is as given. Run and see it on SQL Fiddle.

CREATE TABLE products (
    product_id  int AUTO_INCREMENT PRIMARY KEY,
    name        varchar(20),
    description varchar(30)
);

INSERT INTO products
    (name, description)
VALUES
    ('Rug', 'A cool rug' ),
    ('Cup', 'A coffee cup');

-- ========================================

CREATE TABLE variants (
    variant_id int AUTO_INCREMENT PRIMARY KEY,
    variant    varchar(50)
);

INSERT INTO variants
    (variant)
VALUES
    ('color'),
    ('material'),
    ('size');

-- ========================================

CREATE TABLE variant_value (
    value_id   int AUTO_INCREMENT PRIMARY KEY,
    variant_id int,
    value      varchar(50)
);

INSERT INTO variant_value
    (variant_id, value)
VALUES
    (1, 'red'),
    (1, 'blue'),
    (1, 'green'),
    (2, 'wool'),
    (2, 'polyester'),
    (3, 'small'),
    (3, 'medium'),
    (3, 'large');

-- ========================================

CREATE TABLE product_variants (
    product_variants_id int AUTO_INCREMENT PRIMARY KEY,
    product_id          int,
    productvariantname  varchar(50),
    sku                 varchar(50),
    price               float
);

INSERT INTO product_variants
    (product_id, productvariantname, sku, price)
VALUES
    (1, 'red-wool', 'a121', 50),
    (1, 'red-polyester', 'a122', 50);

-- ========================================

CREATE TABLE product_details (
    product_detail_id   int AUTO_INCREMENT PRIMARY KEY,
    product_variants_id int,
    value_id            int
);

INSERT INTO product_details
    (product_variants_id, value_id)
VALUES
    (1, 1),
    (1, 4),
    (2, 1),
    (2, 5);

Question 2

Part of your issues stem from a confusion between product and SKU.

When you sell, "XYZ pullover, size M, blue model", the latter corresponds to an SKU. It is marketed as an XYZ pullover (the product), which has a set of attributes (size and colors), each with their own set of potential values. And not all possible combinations of the latter might yield valid deliverables: you won't find absurdly thin and long jeans. SKUs, products, attributes, attribute values.

And when a user wants a $10 blue pullover, he's actually looking for an SKU within a product category.

I hope the above clears up your confusion and where your problem and question stem from.

In terms of schema, you want something like this:

products

#product_id
name
description

Optionally, also add:

price
in_stock

This is a marketing related table. Nothing else. If anything outside of marketing uses a product in your application, you'll end up in a world of pain down the road.

The price, if present, is a master price used to populate the field when it's null in SKUs. This makes price entry more user-friendly.

in_stock is a hopefully self-explanationary flag, ideally maintained by a trigger. It should be true if any SKU related to that product is in stock.

product_attributes

product_id
#attribute_id
name

product_attribute_values

attribute_id
#value_id
value

This just holds things like Color, Size, etc., along with their values like blue, red, S, M, L.

Note the product_id field: create a new set of attributes and values per product. Sizes change depending on the product. Sometimes it's S, M, L, etc.; other times, it'll be 38, 40, 42, and what not. Sometimes, Size is enough; other times, you need Width and Length. Blue might be a valid color for this product; another might offer Navy, Royal Blue, Teal and what not. Do NOT assume that there is any relationship between one product's attributes and those of another; the similarities, when they exist, are entirely cosmetic and coincidental.

SKUs

product_id
#sku_id
price

Optionally, add:

name
barcode
stock

This corresponds to the deliverables that get shipped.

It's actually the most important table underneath. This, rather than the product_id, is almost certainly what should get referenced in customer orders. It's also what should get referenced to for stock-keeping and so forth. (The only exception I've ever seen to the latter two points is when you sell something really generic. But even then, the better way to deal with this in my experience is to toss in an n-m relationship between interchangeable SKUs.)

The name field, if you add it, is primarily for convenience. If left null, use app-side code to make it correspond to the generic product's name, expanded if necessary with the relevant attribute names and values. Filling it allows to rephrase the latter generic name ("Levis' 501, W: 32, L: 32, Color: Dark Blue") with something more natural ("Levis' 501, 32x32, Dark Blue").

In case it matters, stock is better maintained using a trigger in the long run, with a double-entry bookkeeping schema in the background. This allows to distinguish between in stock and available for shipment today (which is the figure that you actually want here) vs in stock but already sold, among the multitudes of real-world scenarios that you'll encounter. Oh, and... it's occasionally a numeric, rather than an integer, if you ever need to sell anything measured in kilos or liters. If so, be sure to add an extra is_int flag, to avoid customers sending you orders for .1 laptops.

product_variants

product_id
#sku_id
#attribute_id
value_id

This links the deliverable's id with the corresponding attributes and values, for the sake of generating default names.

The primary key is on (sku_id, attribute_id).

You might find the product_id field an aberrance. It is, unless you add foreign keys referencing:

SKUs (product_id, sku_id)
product_attributes (product_id, attribute_id)
product_attribute_values (attribute_id, value_id)

(Don't forget the extra unique indexes on the corresponding tuples if you decide to add these foreign keys.)

Three additional remarks in conclusion.

Firstly, I'd like to stress once again that, in terms of flow, not all combinations of attributes and values yield a valid deliverable. Width might be 28-42 and length might be 28-42, but you probably won't see a seriously skinny 28x42 jeans. You're best off NOT automatically populating every possible variation of every product by default: add UI to enable/disable them as needed, make it checked by default, alongside name, barcode and price fields. (Name and price will usually be left blank; but one day, you'll need to organize a sale on blue pullovers only, on grounds that the color is discontinued, while you continue to sell the other options.)

Secondly, keep in mind, if you ever need to additionally manage product options, that many actually are product attributes in disguise, and that those that aren't yield new SKUs that must also be taken into account when it comes to stock-keeping. A bigger HD option for a laptop, for instance, is really a variant of the same product (Normal vs Large HD size) that is masquerading as an option due to (very valid) UI considerations. In contrast, wrapping the laptop as a christmas gift is a genuine option that has references a completely separate SKU in bookkeeping terms (e.g. .8m of gift wrap) -- and, should you ever need to come up with average marginal costs, a fraction of staff time.

Lastly, you'll need to come up with an ordering method for your attributes, their values, and the subsequent variants. For this, the easiest is to toss in an extra position field in the attributes and values tables.

Question 3

I would use 4 tables:

generic_product: product_id, name, description

e.g. 1, 'rug', 'a coffee rug' / 2, 'mug', 'a coffee mug'

generic_product_property: product_id, property_id, property_name

e.g. 1, 10, 'color' / 1, 11, 'material'

sellable_product: sku, product_id, price

e.g. 'A121', 1, 50.00 / 'A122', 1, 45.00

sellable_product_property: sku, property_id, property_value

e.g. 'A121', 10, 'red' / 'A121', 11, 'wool' / 'A122', 10, 'green' / 'A122', 11, 'wool'

This will allow your user to define any property for your sellable products he wants.

Your application will have to ensure with its business logic that sellable_products are described completely (check that for every applicable generic product property the sellable product property is defined).

Question 4

This is similar to another question I saw a while back her on SO

Designing a database : Which is the better approach?

If you take a look there you'll see that you are basically asking the same narrow (attribute based) vs. wide table question. I've used both depending on the scenario, but I'd be really careful the way you have it implemented right now. And the fact that there really isn't a good way to match those variants to the SKUs (at least not that I can think of) may force you to change your tables.

If you have so many different variants you also may want to look into a key-value database, or some other NoSQL solution.

Question 5

In general terms, you're looking for what's called a grouper or a junk dimension. Basically it's just an row for every combination.@sahalMoidu's schema looks like it should give you what you are asking for.

But before getting too hung up on normalization, you need to know if the db is there for storing data (transactional, etc) or for getting data out (dimensional, reporting, etc). Even if it is a transactional database, you have to ask yourself what you are trying to accomplish by normalization.

Question 6

Sku is your primary key. You can setup foreign key relationships to the variants table with sku. Forget about productid entirely.

Create table x (sku, price, description) primary key sku

Question 7

I think we should add product_variant_ids instead of productVariantName and remove product_details table.

Also add extra table tbl_variant_group

tbl_variant_group

variant_group_id, title -> (1 color) , (2 size), (3 material)

tbl_variant

id, variant_group_id, title ->
(1, 1, red), (2, 2, small), (3, 3, polyester) (4, 1, green), (5, 3, cotton)

Now we can add combination of variant ids instead of productVariantName.

Such as combination [1,2,3] gives us product xyz with Color: red, Size: small and Material: polyester.

Combination [4, 2, 5] gives us product xyz with Color: green, Size: small and Material: cotton.

Now whenever user selects different combination we can compare that with the combination we stored.

Note: While saving combination we can store it in asc order of variant_group title. So that we can compare in same order like color->material->size which will be red->polyester->small -> [1,3,2].

We can add any n number of variant ids.