Question

I have been programming relational database for many years, but now have come across an unusual and tricky problem:

I am building an application that needs to have very quick and easily defined entities (by the user). Instances of these entities could then be created, updated, deleted etc.

There are two options I can think of.

Option 1 - Dynamically created tables

The first option is to write an engine to dynamically generate the tables, and insert the data into these. However, this would become very tricky, as every query would also need to be dynamic, or at least dynamically created stored procedures etc.

Option 2 - Entity - Key - Value Pattern

This is the only realistic option I can think of, where I have 5 table structure:

EntityTypes

EntityTypeID int

EntityTypeName nvarchar(50)

Entities

EntityID int

EntityTypeID int

FieldTypes

FieldTypeID int

FieldTypeName nvarchar(50)

SQLtype int

FieldValues

EntityID int

FIeldID int

Value nvarchar(MAX)

Fields

FieldID int

FieldName nvarchar(50)

FieldTypeID int

The "FieldValues" table would work a little like a datawarehouse fact table, and all my inserts/updates would work by filling a "Key/Value" table valued parameter and passing this to a SPROC (to avoid multiple inserts/updates).

All the tables would be heavily indexed, and I would end up doing many self joins to obtain the data.

I have read a lot about how bad Key/Value databases are, but for this problem it still seems to be the best.

Now my questions!

  • Can anyone suggest another approach or pattern other than these two options?
  • Would option two be feasible for medium sized datasets (1 million rows max)?
  • Are there further optimizations for option 2 I could use?

Any direction and advice much appreciated!

Was it helpful?

Solution

Personally I would just use a "noSQL" (key/value) database like MongoDB.

But if you need to use a relational database option 2 is the way to go. A good example of that kind of model is the Alfresco Data Dictionary (Alfresco is an enterprise content management system). It's design is similar to what you describe, although they have multiple columns for field values (for every simple type available in the database). If you add a good cache system to that (for example Ehcache) it should work fine.

OTHER TIPS

It sounds like this might be a solution in search of a problem. Is there any chance your domain can be refactored? If not - theres still hope.

  • Your scalability for option 2 will depend a lot on the width of the custom objects. How many fields can be created dynamically? 1 million entities when each entity has 100 fields could be a drag... Efficient indexing could make performance bearable.

  • For another option - you could have one data table that has a few string fields, a few double fields, and a few integer fields. For example, a table with String1, String2, String3, Int1, Int2, Int3. A second table with have rows that define a user object and map your "CustomObjectName" => String1, and such. A stored procedure reading INFORMATION_SCHEMA and some dynamic sql would be able to read the schema table and return a strongly typed recordset...

  • Yet another option (for recent versions of SQL Server) would be to store a row with an id, a type name, and an XML field that contains a XML document that contains the object data. In MS Sql Server this can be queried against directly, and maybe even validated against a schema.

PErsonally I would take the time to define as many attritbutes as you can ratheer than use EAV for everything. Surely you know some of the attributes. Then you only need EAv for the things that are truly client specific.

But if all must be EAV, then a nosql databse is the way to go. Or you can use a relationsla datbase for some stuff and a nosql database for the rest.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top