Question

What is a preferable way to move ETL data into Data Warehouse? Create/Update date on the OLTP application, or Change Data Capture (CDC)?

Given the two are allowed on a system and resourcing is not an issue, I would think CDC is preferable. As sometimes, I have seen software application/developers make mistakes where the Create/Update columns do not work. Just want someone to validate thinking.

Also curious to what Kimball mentioned, seems he prefers CDC here?

https://www.kimballgroup.com/2007/10/subsystems-of-etl-revisited/

https://www.kimballgroup.com/2009/10/six-key-decisions-for-etl-architectures/

Was it helpful?

Solution

You've answered your own question!

It's ALWAYS better to go with your vendor's solution rather than "rolling your own"... How many people will test your "home-cooked solution"? How many edge cases will they encounter?

I'm willing to go out on a limb here and state that maybe, just maybe, it might conceivably be within the bounds of possibility that your customer base isn't quite as large as that for Microsoft SQL Server... Microsoft have literally had MILLIONS of people test their solution - and it's likely any bugs will be patched PDQ!

This is a quote from an answer I gave previously - original quote is from Jonathan Lewis) who wrote these books! Not related at first glance, but the relevance will hopefully become obvious!

Chapter 10: Design Disasters, by Jonathan Lewis

More war stories, for fans of Chapter 8! "Now prepare yourself to read all about 'The World's Worst Oracle Project.'" - Jonathan Lewis.

This chapter describes some of the most common mistakes in development Oracle database applications. You'll certainly recognise some of them, because so many people stubbornly cling to certain beliefs. I know I like to bring up several of his points when I get into common arguments like these:
1. We want our application to be "Database Independent."
2. We will check data integrity at the application level instead of taking advantage of Oracle's constraint checking abilities.
3. We want to use sequences for our primary keys.

Just substitute "implement Change Data Capture" for "check data integrity" in item 2 and you get the picture.

I implore you to think of yourself, your fellow employees (who'll be clearing up the mess long after you've retired!), your company (if they're a decent employer) and your customers (if you care anything about them).

Step away from the keyboard and have a long hard think about this and then come up with the answer towards which I'm gently guiding you! :-)

[EDIT]: In response to points raised in the OP's comment.

No, the industry is not "switching viewpoints"! NoSQL is series of different types of compromise, solving some problems while creating a whole set of new ones! NoSQL is still niche and will forever remain so. See this article. Many NoSQL vendors are scrambling to put a) SQL interfaces over their products (cf. Google's Spanner/F1 system) and b) to implement ACID semantics - who wants an incorrect bank balance (BASE)?

You are being deafened by the sound of the wheels of a big bandwagon rolling through town and the clamouring of those who've hopped on board. Check out the writings of Michael Stronebraker on the topic - a real database pioneer and still going strong with his latest NewSQL gig, VoltDB!

Take a look at where the action is going to be in the future - it will be with systems like Spanner (two of which I have used for College projects are CockroachDB and TitaniumDB, also check out YugaByte and ActorDB) and, although still a bit "raw", they have architectures which are ACID compliant, while simultaneously answering many of the concerns of the NoSQL crowd!

As for the remarks about Azure Data Warehouse and (some) AWS systems not having FOREIGN KEYs - I'll betcha a dollar to a doughnut that this data came from systems which do have FOREIGN KEYs. Those who use them are deliberately choosing to (potentially) compromise on data integrity for the speed tradeoff that they will get in these DWs!

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top