Question

I'm working on a data warehouse fact table design for a contact history fact table. My current schema looks something like this:

[FK] DateKey INT
[FK] TimeKey INT
[NK] CustomerNK INT
[NK] CustomerPhoneNK INT
[FK] ContactTypeKey INT
[FK] ContactResultKey INT
[BK] ContactRefBK INT
     ContactTS DATETIME
     Counter INT (=1)

One of my application requirements is to find the most recent ContactResult for a selection list on the ContactType dimension. The ContactType dimension has a ContactClass attribute that will be used to identify the range of values to filter by.

The above structure lets me get all of the contact information for the ContactType selections by ContactClass, and I can process that list to get the most recent values.

The question is, can anyone suggest a modification to the above that would make it simpler to get the most recent contact event of a particular ContactClass? Currently this is a Transactional fact table, but I would be happy to change that if it will improve the usability.

This operation will be run fairly frequently against a wide selection of customers (200K+), so performance is important. The operation will be done in C# code on a web interface, so BI Tool-specific solutions are not useful to me in this instance.

So far the only idea I've come up with is an accumulating fact table that records only the latest record for each ContactClass. Any improvements on this option would be greatly appreciated.

Was it helpful?

Solution

If performance is important, and batch processing is an option, then you can pre-calc and save the 'Latest Contact' attribute in the fact or ContactType dimension.

Both operations require you to update historical fact records to set them to 'no longer the latest contact' but you will get much better performance if you pre calc this attribute.

I would be inclined to add this attribute to the dimension, and update historical SK's in the fact to reflect a dimension member that is not 'Latest Contact'.

With some thought, there is probably a smart way to do this update.

OTHER TIPS

If performance is key, then I think an additional fact table for latest contact is just fine. After all, that's what the data mart is for--pre-aggregated data for fast performance. It's not quite an accumulating snapshot, which typically has several foreign keys to the time dimension in order to measure the time span between events.

It seems like 200k+ is not a terribly large number, so you may be able to achieve the same thing in a simpler manner with a view. I may have the columns wrong, but something like this will be very fast with the indexing in place:

SELECT ContactTypeKey, MAX(ContactTs) FROM factContact GROUP BY ContactTypeKey

Then that view can be used to join back to the fact table by the ContactTypeKey and ContactTs to return the ContactResult. This assumes that your table name is factContact, and that ContactTs determines the most recent, etc. In reality, you may need to join to the date dimension to calculate most recent, and you may need to group by more dimensions or maybe join to the contactType dimension and group by ContactClass. I've used this strategy on occasion, but it's hard to say how well it would apply here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top