Question

I have a budding developer who is very enthusiastic about something he is calling “the matrix”

I am looking for peer insight

In a nutshell this is what we have:
- 1 highly denormalized table with about 120 columns
- Data points range from account, customer, household, relationship, product, employee, etc…
- One index per column: about 120 non-clustered indexes
- About 90% of all space in the database used by indexes today are indexes on this table
- Today about 1.5 million rows with a lot of nulls
- Table loaded with a stored procedure whose core is dynamic SQL
- All Field names are generic and do not describe the data
- A data dictionary type table is used with the dynamic SQL to load any data point to any field
- Field mapping is not static: today column dim_0001 is customer name, but tomorrow maybe something else
- No primary key
- No foreign keys
- No real constraints (For example all fields are nullable)

The argument for the table:
- Makes writing queries simpler because it eliminates the needs to write some join

The intended use:
- An End User Layer and would be a core component of a Universe build in Business Objects
- Post ETL process development

My recommendation will either kill the process where it is today (early development in a test environment) or move it to the next step in test.

Based on the research I have done, my education, and experience I do not support it and want the tables dropped as soon as the one or two processes that depend on these tables have been migrated to another solution.

Script below for your reference (I limited to one index example).

Any insight you can offer (even just a one word opinion) is valuable

-- The Matrix

CREATE TABLE [z005497].[tblMatrix](
    [as_of_dt] [datetime] NOT NULL,
    [dim_0001] [varchar](100) NULL,
    [dim_0002] [varchar](103) NULL,
    [dim_0003] [varchar](100) NULL,
    [dim_0004] [varchar](100) NULL,
    [dim_0005] [varchar](100) NULL,
    [dim_0006] [varchar](100) NULL,
    [dim_0007] [varchar](100) NULL,
    [dim_0008] [varchar](100) NULL,
    [dim_0009] [varchar](100) NULL,
    [dim_0010] [varchar](100) NULL,
    [dim_0011] [varchar](100) NULL,
    [dim_0012] [varchar](100) NULL,
    [dim_0013] [varchar](100) NULL,
    [dim_0014] [varchar](100) NULL,
    [dim_0015] [varchar](100) NULL,
    [dim_0016] [varchar](100) NULL,
    [dim_0017] [varchar](103) NULL,
    [dim_0018] [varchar](103) NULL,
    [dim_0019] [varchar](103) NULL,
    [dim_0020] [varchar](103) NULL,
    [dim_0021] [varchar](103) NULL,
    [dim_0022] [varchar](103) NULL,
    [dim_0023] [varchar](103) NULL,
    [dim_0024] [varchar](103) NULL,
    [dim_0025] [varchar](103) NULL,
    [dim_0026] [varchar](11) NULL,
    [dim_0027] [varchar](11) NULL,
    [dim_0028] [varchar](11) NULL,
    [dim_0029] [varchar](11) NULL,
    [dim_0030] [varchar](11) NULL,
    [dim_0031] [varchar](11) NULL,
    [dim_0032] [varchar](11) NULL,
    [dim_0033] [varchar](11) NULL,
    [dim_0034] [varchar](11) NULL,
    [dim_0035] [varchar](11) NULL,
    [dim_0036] [varchar](11) NULL,
    [dim_0037] [varchar](11) NULL,
    [dim_0038] [varchar](11) NULL,
    [dim_0039] [varchar](11) NULL,
    [dim_0040] [varchar](11) NULL,
    [dim_0041] [varchar](11) NULL,
    [dim_0042] [varchar](11) NULL,
    [dim_0043] [varchar](11) NULL,
    [dim_0044] [varchar](11) NULL,
    [dim_0045] [varchar](11) NULL,
    [dim_0046] [varchar](11) NULL,
    [dim_0047] [varchar](11) NULL,
    [dim_0048] [varchar](11) NULL,
    [dim_0049] [varchar](11) NULL,
    [dim_0050] [varchar](11) NULL,
    [dim_0051] [varchar](11) NULL,
    [dim_0052] [varchar](11) NULL,
    [dim_0053] [varchar](11) NULL,
    [dim_0054] [varchar](5) NULL,
    [dim_0055] [varchar](5) NULL,
    [dim_0056] [varchar](5) NULL,
    [dim_0057] [varchar](5) NULL,
    [dim_0058] [varchar](5) NULL,
    [dim_0059] [varchar](5) NULL,
    [dim_0060] [varchar](5) NULL,
    [dim_0061] [varchar](5) NULL,
    [dim_0062] [varchar](5) NULL,
    [dim_0063] [varchar](5) NULL,
    [dim_0064] [varchar](5) NULL,
    [dim_0065] [varchar](5) NULL,
    [dim_0066] [varchar](5) NULL,
    [dim_0067] [varchar](5) NULL,
    [dim_0068] [varchar](5) NULL,
    [dim_0069] [varchar](5) NULL,
    [dim_0070] [varchar](5) NULL,
    [dim_0071] [varchar](5) NULL,
    [dim_0072] [varchar](5) NULL,
    [dim_0073] [varchar](5) NULL,
    [dim_0074] [varchar](5) NULL,
    [dim_0075] [varchar](5) NULL,
    [dim_0076] [varchar](5) NULL,
    [dim_0077] [varchar](5) NULL,
    [dim_0078] [varchar](5) NULL,
    [dim_0079] [varchar](5) NULL,
    [dim_0080] [varchar](5) NULL,
    [dim_0081] [varchar](5) NULL,
    [dim_0082] [varchar](5) NULL,
    [dim_0083] [varchar](5) NULL,
    [dim_0084] [int] NULL,
    [dim_0085] [int] NULL,
    [dim_0086] [int] NULL,
    [dim_0087] [int] NULL,
    [dim_0088] [int] NULL,
    [dim_0089] [int] NULL,
    [dim_0090] [int] NULL,
    [dim_0091] [int] NULL,
    [dim_0092] [int] NULL,
    [dim_0093] [int] NULL,
    [dim_0094] [varchar](12) NULL,
    [dim_0095] [varchar](12) NULL,
    [dim_0096] [varchar](12) NULL,
    [dim_0097] [varchar](120) NULL,
    [dim_0098] [varchar](120) NULL,
    [dim_0099] [varchar](120) NULL,
    [dim_0100] [numeric](20, 0) NULL,
    [dim_0101] [varchar](20) NULL,
    [dim_0102] [varchar](20) NULL,
    [dim_0103] [varchar](20) NULL,
    [dim_0104] [varchar](20) NULL,
    [dim_0105] [varchar](20) NULL,
    [dim_0106] [varchar](20) NULL,
    [dim_0107] [varchar](20) NULL,
    [dim_0108] [varchar](20) NULL,
    [dim_0109] [varchar](20) NULL,
    [dim_0110] [varchar](20) NULL,
    [dim_0111] [varchar](20) NULL,
    [dim_0112] [varchar](20) NULL,
    [dim_0113] [varchar](20) NULL,
    [dim_0114] [varchar](20) NULL,
    [dim_0115] [varchar](20) NULL,
    [dim_0116] [varchar](20) NULL,
    [dim_0117] [varchar](20) NULL,
    [dim_0118] [varchar](20) NULL,
    [dim_0119] [varchar](20) NULL,
    [dim_0120] [varchar](20) NULL,
    [lastLoad] [datetime] NULL
) ON [PRIMARY]



-- Index example

CREATE NONCLUSTERED INDEX [idx_dim_0001 (not unique)] ON [z005497].[tblMatrix] 
(
    [dim_0001] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]


-- The configuration table from which developers would find out what is in the Matrix

CREATE TABLE [z005497].[tblMatrixCfg](
    [dimId] [int] IDENTITY(100000,1) NOT NULL,
    [colName] [varchar](25) NOT NULL,
    [dataType] [varchar](25) NOT NULL,
    [dimName] [varchar](25) NOT NULL,
    [dimDesc] [varchar](500) NOT NULL,
    [dimpath] [varchar](5000) NOT NULL,
    [loadDate] [datetime] NOT NULL,
    [modUser] [varchar](100) NOT NULL,
    [modDate] [datetime] NOT NULL,
 CONSTRAINT [PK_tblMatrixCfg_1] PRIMARY KEY CLUSTERED 
(
    [dimId] ASC,
    [colName] ASC,
    [dimName] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
Was it helpful?

Solution

Kill it if you can.

Also, that developer needs a lot more experience. And he/she should get it at another company.

It's basically violating so many things I don't know where to start.

Even if you end up fighting a highly normalized model which is following someone's best practices slavishly, it won't compare to the disaster which this design is going to create.

OTHER TIPS

Just to give one example of what Cade meant with "I don't know where to start" :

"today column dim_0001 is customer name, but tomorrow maybe something else"

This typically also means that in the User acceptance system, dim_0001 can be customer name (and the system might seem to work and get accepted), and then you move to production, and dim_0001 gets to be name of the president's wife or so, and then hours of meetings need to be spent trying to figure out (a) where the problem is, and (b) how to get it fixed in as little time as possible.

( (b) usually amounts to patching the code with stuff like "if col_name = dim_0001 then don't treat it as what the matrix says it is, but treat it as what is hardcoded here instead".)

"What use is there for the Matrix?"

Well, I certainly don't get it.

I have never seen anything like this before and I don't understand how it is meant to be used or how the indexes is meant to speed up anything or how it is possible to query this table without using at least self joins.

Call me inexperienced if you like but this is a first for me. I would think that if this is the way to do things, the db vendors should not put so much effort into allowing us developers to define tables, with columns that have different data types, with relationships.

This is the result of trying to stuff an object oriented paradigm into a relational system. Document databases allow for this sort of programming:

Documents inside a document-oriented database are similar, in some ways, to records or rows, in relational databases, but they are less rigid. They are not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like. For example here's a document:

FirstName="Bob", Address="5 Oak St.", Hobby="sailing".

Another document could be:

FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8},
{Name:"Samantha", Age:5}, {Name:"Elena", Age:2}].

Both documents have some similar information and some different. Unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty 'fields' in either document (record) in this case. This system allows new information to be added and it doesn't require explicitly stating if other pieces of information are left out.

Trying to use this paradigm in a relational database is a "square peg, round hole" problem. A document database might be excellent for a highly transactional system, but analysis would be better served by loading the transactional data into various fact tables in a data warehouse.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top