Question

I am designing a loan origination system which would allow it's users to create loans, draw repayment schedule of the loan depending on the loan product parameters. I should also be able to add penalty, fees etc. Rescheduling loan should be possibility. I also need a loan schedule to do fast reporting.

I have a loans table, loan product table, payment schedule table and loan history table etc. I am not able to understand how I can design ahead this schema to keep it from changing too much.

I am doing this in ruby, rails3 and datamapper.

Was it helpful?

Solution

Except in the most tightly specified applications, I'm not sure you can design a schema that won't change much. What you can do is to make schemas that are not brittle, schemas that allow for change to happen. For the most part, that means.

  1. Include only the data you know you need to meet today's requirements
  2. Normalize.
  3. Write automated tests.

The first rule is akin to "do the simplest thing possible," or "you ain't gonna need it," the rule that programmers use to avoid code bloat. Smaller schemas, like smaller code bases, need less effort to change. The second (normalize) is analagous to the Don't Repeat Yourself (DRY) principle, also known as "once and only once," another rule used to make code cheaper to change. The third rule (tests) is how programmers make refactoring possible without worrying about breaking everything. By tests, I mean testing the code that uses the schema, but also testing the schema itself: triggers, rules, cascading deletes, &c. can be tested, and when tested, it is easier to change them in response to changing requirements.

There are excuses, in the database world, for breaking these rules. The reason to break rule 1 (do the simplest thing/YAGNI) is that some data will be easier to collect from the beginning, and difficult or perhaps even impossible to collect if you decide you do need it later. Still, think twice before giving in to this excuse. You can almost always deal without too much fuss with gaps in the data caused by adding columns or tables later, but if you include today data you might only need tomorrow, you will be paying for it every time you change the schema. Each bit of data you include that you end up not needing was nothing but cost with no benefit. Perhaps more significantly, extra data can have a terrible effect upon performance, since it reduces the number of records that can fit in memory. Even though databases go through great pains to give good performance when reading from disk, their best performance comes from having enough memory (or little enough data) so that all or most of the working set will fit in RAM.

The excuse for breaking rule 2 (normalization) is performance: "Data warehouse" applications sometimes require denormalization in cases where many-table joins make a database slow and cranky. I'd want to be certain it was needed before denormalizing, since it's not free: data that exists in more than once place makes the schema more difficult to change, and trades off speed of queries for more work when inserting & updating.

I don't know of an excuse for breaking rule 3 (testing), or at least not a good one, but that doesn't mean there isn't one.

Martin Fowler writes "Evolutionary Database Design". Scott Amber and Pramod Sadalage have a book on Refactoring Databases. See also a summary/cheat sheet of the book's refactorings.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top