Is my understanding of Fact table granularity correct?

https://dba.stackexchange.com/questions/77402

09-12-2020
|

Question

Myself and another DBA at our company are tasked with reviewing a database design that a vendor has developed for us. The vendor has said they use Kimball as the basis for their design. (NOTE: I am not looking for arguments of Kimball vs Inmon, etc.) They have designed a mart with multiple facts and dimensions.

Now in all fairness, our company has never designed a single mart. We have always had the consultants do it. And we have never been sent to classes or anything. So our knowledge of warehousing/marts/dimensional modelling, etc. is based on what little experience we have, what we can find on the Internet, and self reading (we have Inmon's and Kimball's books and are trying to make our way through them).

Now that the stage is set for my level of knowledge, we come to the design challenge.

There is a Fact table called "Claim Loss Statistics" (this is for insurance). And they are trying to capture both the payments for claims (rolled up to a monthly level), and then money in the reserves (kind of like a bank account for claims). They wish to see the monthly amounts for payments (no biggie). But they wish to see the account current balance of the reserves.

I'll give a pictorial example.

Say we set up $1000 USD in reserves for a claim. This gets set aside (so in some respects it functions kind of like a bank account).

In October 2014, we don't pay out anything yet. So the business wants to see the payments and the reserve balance at the end of October.

-----------------------------------------------
-  MONTH_YEAR  -  PAYMENTS -  RESERVE_BALANCE -
-----------------------------------------------
-      102014  -      0.00 -          1000.00 -
-----------------------------------------------

Then November comes along. We make out payments of $100, $150, and $75 dollars. They wish to see those amounts aggregated and the reserve at the balance as follows:

-----------------------------------------------
-  MONTH_YEAR  -  PAYMENTS -  RESERVE_BALANCE -
-----------------------------------------------
-      102014  -      0.00 -          1000.00 -
-----------------------------------------------
-      112014  -    325.00 -           675.00 -
-----------------------------------------------

And then say we have zero payments in December and then $200 more in January next year.

-----------------------------------------------
-  MONTH_YEAR  -  PAYMENTS -  RESERVE_BALANCE -
-----------------------------------------------
-      102014  -      0.00 -          1000.00 -
-----------------------------------------------
-      112014  -    325.00 -           675.00 -
-----------------------------------------------
-      122014  -      0.00 -           675.00 -
-----------------------------------------------
-       12015  -    200.00 -           475.00 -
-----------------------------------------------

Here is where I struggle. My understanding is that the payments part is correct. They are all rolled up at a monthly level within each record. So you can rollup further if you want for the year, quarter, etc.

But the reserves amount is different. It is a balance. And the business wants to see how much is in the balance at each month. But you can't aggregate on this field. If you did, you would get some wonky results.

Somehow this strikes me as wrong. But I can't truthfully say I've modelled enough or know enough. All I can say is what I know. And from what I know, all values in a Fact should be at the same granularity.

Both numbers are at the same granularity of a "month", but they are not from the standpoint of what they stand for. One is aggregate dollars within a month. The other is just the balance.

Is this correct? I have been pushing back on this design. Am I wrong to do so? Is it ok to do this in a Fact? Or is my sense of "code smell" of a bad design accurate?

Any help would be appreciated. NOTE: please don't just say "It should be way X", please explain why it should be that way so I can learn from this.

EDIT: Well, I learned that my initial understanding of the Fact is wrong. The granularity is NOT monthly. The granularity is transaction level. So that means within the MONTH_YEAR (ie really it is the financial reporting period) there will be multiple payment and recovery transactions. Those will be posted by date or transaction date. But because of a prior report the business sees, and also because of how the data is stored in the legacy system that this comes from they wanted to put both the transactional data (one row per) and the reserve monthly balance (one row per month).

Once I learned that, I realize the problem was not so much additive vs non-additive, or even semi-additive as it was grain, which is what I was suspecting from the start. Our DBA team discussed this with the project team and reported that they are attempting to put two different grains in the same fact, and this was not correct. That they should either role up the transactions to a monthly level, allowing them to then have the payments, recoveries, and the monthly reserve balance (ie a semi-additive fact) because everything would be at a monthly grain. Or they need to find a way to break down the reserve balance into transactions to preserve the transaction level grain. Or they need to break the fact into two facts. One can be the monthly level for the reserve balance. The other can be at the transaction level for the payments and recoveries. (There is no reason why they also couldn't put the payments and recoveries at a monthly level in the monthly level fact too. Just depends on the business needs.)

Given what I have learned, I will be marking Thomas' answer as the correct one. However, I feel the discussion I started with the original question is still a good one for others to learn from, so I will leave the original portion of my question intact. I also intend to award a bounty to nikadam's answer as that taught me a lot about additive, non-additive, and semi-additive facts, and corrected a lot of misunderstandings I had about dimensional modelling.

Solution

Your intuition of code smell is well honed.

What you are dealing with on reserves is what Kimball calls a "semi-additive fact". It does not roll up nicely to quarter or year.

The typical solution to this is to have two fact tables, one for the additive fact (payments in your case) and one for the non-additive fact. The non-additive fact does not actually need to have a grain at the month level, you could store them all the way to the day and things still just work.

The non-additive fact, reserve, is queried differently than the other fact. There is a business decision you need to make: What does reserve at the year level mean? Is it the last month of the year, or maybe the average of the months in the year? Whichever choice you make, you can find the solution to modeling this it in the Kimball books under the chapters on non-additive facts.

Please note that if you use a cube product like Analysis Services, it is possible to have the aggregates "just work" even if you store it all in one table. However, I prefer to keep things separate so relational queries are easier to write (and the facts are easier to load too).

OTHER TIPS

You are correct: "different grains must not be mixed in the same fact table".

But reserve balance at the end of the month and sum of payments at the end of the month are at the same grain. It just one of facts is semi-additive. Type of fact (additive or not) does not define table's grain.

From what you describing, I see your grain as "monthly claim snapshot", which makes your fact table "Periodic Snapshot Fact table".

In this article Kimball have an example of additive and semi-additive facts in the same fact table.

Here is example of periodic snapshot with semi-additive facts from The Data Warehouse Toolkit (page 116):

Kimball's The Data Warehouse Toolkit, page 116

Best practice is to have transactional fact table that will reflect every change in reserve (payments and adjustments) on the lowest atomic level. When you deal with claims, often the atomic level is not claim but sub-claim (your insurance company may have its own term for it). Generally each sub-claim will represent different party to the claim and payments / reserves for each party. For example, there may be no payments to the insured, but payments to not-insured by your company injured person and payments to the hospital and attorney.

Depending on performance of your BI tool you may use transactional fact table directly to get monthly payments and balances. Or you may update periodic snapshot fact table from transactional daily or at the end of the month.

Ability of handling semi-additive facts will depend on what BI layer you are using. Some tools able to deal easily with semi-additive facts and some not.

Kimball's main book (The Data Warehouse Toolkit) has full chapter (16) on insurance.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange