What is best practice for this problem (different properties for different categories)?
-
03-07-2019 - |
Question
I have some products that belongs to the some category.
Each category can have different properties.
For example,
- category cars has properties color, power, ...
- category pets have properties weight, age, ...
Number of categories is about 10-15. Number of properties in each category is 3-15. Number of products is very big.
Main requirement for this app is very good search. We will select category, and enter criteria for each property in this category.
Have to design database for this scenario. (SQL Server 2005)
Solution
The classic design approach would be (the star denotes the primary key column):
Product
ProductId*
CategoryId: FK to Category.CategroyId
Name
Category
CategoryId*
Name
Property
PropertyId*
Name
Type
CategoryProperty
CategoryId*: FK to Category.CategoryId
PropertyId*: FK to Property.PropertyId
ProductProperty
ProductId*: FK to Product.ProductId
PropertyId*: FK to Property.PropertyId
ValueAsString
If you can live with the fact that every property value would go to the DB as a string and type conversion info is stored in the Property table, this layout would be enough.
The query would go something like this:
SELECT
Product.ProductId,
Product.Name AS ProductName,
Category.CategoryId,
Category.Name AS CategoryName,
Property.PropertyId,
Property.Name AS PropertyName,
Property.Type AS PropertyType,
ProductProperty.ValueAsString
FROM
Product
INNER JOIN Category ON Category.CategoryId = Product.CategoryId
INENR JOIN CategoryProperty ON CategoryProperty.CategoryId = Category.CategoryId
INNER JOIN Property ON Property.PropertyId = CategoryProperty.PropertyId
INNER JOIN ProductProperty ON ProductProperty.PropertyId = Property.PropertyId
AND ProductProperty.ProductId = Product.ProductId
WHERE
Product.ProductId = 1
The more WHERE conditions you supply (conjunctively, e.g. using AND), the faster the query will be. If you have properly indexed your tables, that is.
As it is, the solution is not ideal for a full text indexing situation. An additional table that stores all the text associated with a ProductId in a more denormalized way could help here. This table would need updating through triggers that listen for changes in the ProductProperty table.
OTHER TIPS
If the user of the application has to select a category before they can search, I would separate your products into different database tables by category. This solution is also indicated by the fact that the categories themselves have so little in common. Breaking it down by category will also make each search much faster, since time won't be wasted searching through cars when your user is looking for a pet.
Once you have the products split up in to categories, it should be easy to create the tables using the common properties of the products in each category. The user interface of your application should be dynamic (I'm thinking of a web form), in that the properties the user can choose from should change when the user selects a category.
Please note that if you have products that you want listed in multiple categories, this solution will result in duplicate data in your tables. There is a trade-off between speed and normalization when designing a database. If you don't have products that fit in multiple categories, then I think this will be the fastest solution (in terms of search speed).
Most people are advising to use variations of the Entity-Attribute-Value (EAV) design. This design is overkill for your situation, and it introduces a whole bunch of problems, for example:
- You can't define data type for an attribute; you can enter "banana" for an integer attribute
- You can't declare an attribute as mandatory (i.e. NOT NULL in a conventional table)
- You can't declare a foreign key constraint on an attribute
If you have a small number of categories, it's better to use solution A in Bogdan Maxim's answer. That is, define one table Products with attributes common to all categories, and one additional table for each category, to store the category-specific attributes.
Only if you have an infinite number of categories or if you must potentially support a different set of attributes per row in Products is EAV a good solution. But then you're not using a relational database at all, since EAV violates several rules of normalization.
If you really need that much flexibility, you'd be better off storing your data in XML. In fact, you might look into RDF and semantic web frameworks like Sesame.
You might want to consider an Entity-Attribute-Value type of arrangement, where you can "tag" each product with arbitrary name/value pairs of attributes.
You can try this. I'm not too sure of the actual details of your question, maybe someone can help you translate a little better.
5 tables. 3 for storing the data, 2 for storing the mappings between data.
tProduct
productID
<other product details>
tCategory
categoryID
<other category details>
tProperty
propertyID
<other property details>
tProductXCategory
productyID
categoryID
tCategoryXProperty
categoryID
propertyID
Your queries will need to join the data using the mapping tables, but this will allow you to have different many to many relationships between category, properties, and products.
Use stored procedures or parameterized queries to get better performance out of your searches.
If you want to be flexible on your categories and properties, you should create following tables:
- product: ProductID
- category: CategoryID, ProductID
- property: PropertyID, CategoryID
when you want to share a category over mroe than one product, you have to create a link table for the n:m join:
- productCategoryPointer: ProdCatID, ProductID, CategoryID.
You will have to to some joins in your queries, but with the right indexes, you shoulb be able to query your data fast.
You could try something more object oriented.
1. Define a base table for Products
Products(ProductID, CategoryID, <any other common properties>)
2. Define a table Categories
Categories(CategoryID, Name, Description, ..)
From here you have a lot of options and almost all of them will break the normalization of your database.
Solution A.
Will be a maintaince nightmare if you need to add new products
A1. Define a separate table for each of the categories
Cars(CarID, ProductID, ..)
Pets(PetID, ProductID, ..)
A2. Join the tables based on the relationships in order to use the data
SELECT <fields> FROM Cars INNER JOIN Products ON Cars.ProductID = Products.ProductID
Solution B.
Maintainance nightmare for different types of properties (i.e. int, varchar, etc)
B1. Define a table for Properties
CategoryProperty (CPID, Name, Type)
B2. Define a table to hold the associations between Categories and the Properties
PropertyAssociation (CPID, PropertyID)
B12. Define a table to hold the properties (Alternative for B1 and B2)
Properties(CategoryID, PropertyID, Name, Type)
B3. For each type of property (int, double, varchar, etc.) add a value table
PropertyValueInt(ProductID, CPID, PropertyID, Value)
- for int
PropertyValueString(ProductID, CPID, PropertyID, Value)
- for strings
PropertyValueMoney(ProductID, CPID, PropertyID, Value)
- for money
B4. Join all the tables to retreive the desired property.
By using this approach, you will not have to manage all the properties in separate table, but the value types of them. Basically all the tables involved will be lookup tables. The disadvantage, is that, in order to retreive each value, you have to "Case" for every value type.
Take in mind these articles (here and here) when choosing any of these approaches. This forum post is also interesting and somehow related to the subject, even though it is about localization.
You could also use Tomalak's answer and add strong typing if you feel the need.
I recently had to do this and I m using NHibernate where i have three entities
Product Category Option OptionCategory
A product has 1* Categories
A product has 1* Option
An Option has 1 OptionCategory
once this is set up you can use Nhibernate caching
Cheers