What should be stored in the database (RDBMS)?

https://stackoverflow.com/questions/323287

database-design

11-07-2019
|

Question

Are there any guidelines/best practices for deciding what type of data should be stored in the database?

For ex., is it ok to use database to store

Application logs
Configuration details (like server IP addresses etc.)
System information (e.g., names of shell scripts, scheduling information for batch jobs, batch jobs status etc.)

I have seen applications that use database for storing these. Is this acceptable? What are the pros and cons of such a design?

Solution

To answer this question we have to understand what database storage provides that isn't available in say, flat file storage.

security - You can store stuff and be sure that updates, deletes and views would be controlled
audit - you can keep track of who made changes and when
distributed servers - if you have multiple application servers accessing a single database, you avoid storing the same data in multiple places

If these are desirables for your data, it's a good idea to store them in the database.

OTHER TIPS

Application logs

Although it often is a good idea to limit the data in the database to a specific time range (e.g. dump/archive/condense to stats everything that's older than 3 months), having the logs in database allows very fast and easy analysis of the data. Need to see what a specific user has done? "SELECT * FROM logs WHERE User = 'bla'". Need to find out why the system crashed at a specific time? "SELECT * FROM logs WHERE Timestamp BETWEEN failure - 1 hour AND failure + 5 minutes".

Configuration details (like server IP addresses etc.)

That depends on the configuration details. Some yes, some no. Everything that's valid for applications that run on more than one client (e.g. websites) and that is probably changing quite often (i.e. user settings) should go in the database. For more or less static options, I prefer to use a config file.

System information (e.g., names of shell scripts, scheduling information for batch jobs, batch jobs status etc.)

I guess that's almost the same as config details. If it changes: database. If it's static: config file. Shell scripts will usually be static. Scheduling information and status will change over time.

We have stored everything in the database on the last few projects and it really helps when moving from development to production as there is very little to configure in the application itself.

Logging to the database can be useful (Log4j for e.g.) as it allows widespread access to the logs for the testers and analysts.

I guess it depends on your situation. Everything that is stored in the database adds a level of cemplexity to the system. It is easier to read a file than to access a database to get the same information from code. A probable rule if thumb would be to say that the larger the system, more of it should be stored in the database.

A small point: 99% of the time it's a terrible idea to store configuration in the DB. Config is too important to lose to a DB connection gone south: it needs to be 100% bullet proof.

RE: Config data It might be a good idea to keep config data in the database to make it easier to edit it and keep track of the changes but then wright it out to a config file for the actual program to read.

Why should apache have to know anything about your database information to be able to get to its configuration?
Why should your FTP server stop working when the database is down?

RE: Application logs

As stated earlier, a database can make log analyzing a lot easier, but I urge you to consider the log-to-file-and-batch-import-later pattern.

Performance issues

Databases are great for getting random bits of data out and putting random bits of data in. Log data mostly is not written randomly but in a continues stream of data that is perfect for putting in a file one line after an other. You can't beat the performance of a flat file when it comes to writing the data. There's not a lot of things that can break with a flat file either. This also lets the database concentrate on doing the actual business work.

Then later on you can collect all the logged data from the file, parse it, do any required post processing (like looking up host names from IP addresses) and put it into a database table. You do this as often as you find necessary. For my website I really don't need to be able to view the visitor stats change from one minute to the other so I run the log batch at night. If you need up to date info you can just as well run the batch import every 60 seconds, but this will still be better than doing one extra INSERT statement for every actual business transaction (depending on how much you log, of course).

Security

How do you log a failed database connection if the database is your log engine?
How do you investigate why a system crashed if the database went down early during the events involved in the crash?

So I think you should consider when you need the log data in the database and why you need it in there.

One thing that hasn't been mentioned yet is if you shove things like app configuration in the database, you can't put it under version control as easy.

For example, some CMS's like to shove HTML templates into the database instead of as files. I personally think this is poor design. You can't version any of the changes you make to the templates and worse, all you ever do is copy & paste from a real text editor into the wimpy text editor in the browser.

Bottom line? Ask yourself if this is something you want versioned. If yes, keep it out of the database. If no, sure, put it in the database.

Focus on ease of use and maintenance. The only logs I store in a database are put there by triggers that error out because that's easiest. But for everything else, searching and parsing text logs is faster and easier. If your app crashes, looking at a text config file is easier than looking in the db, especially for new maintainers. It's much, much easier for a new person to come along and see an app.properties file in the config/ directory than to know to look in a table in the database.

In addition, you can more easily store config files in source control if they're text files than if they're in the database. And this is massively important, believe me. You do not want to debug an app where you've lost the config file settings that caused the error. If you have a database crash or corruption, you could lose the logs and config settings which might make finding the problem impossible.

If you're developing a small, static website, then I would agree with most of the points already made. However, if you have a website that allows for users to add content via the production site I would argue that putting configuration in the database complicates the deployment pipeline to the point that keeping it out of the database is preferable.

If you're trying to push an update from dev to production, clients are pushing content to production, and your config and content are both in the same database then you need to target only tables with configuration data to be overwritten. This 'can' be a trivial amount of extra work on your part, but it depends on the scale of the application and whether or not you're making use of someone elses code. Consider drupal sites. If users are adding content then for deployment you need to target specific database tables to be overwritten. Since drupal has several tables (none of which have config in their names) you'll need to do some research to figure out what can be overwritten and what can't. Now, what happens if something changes in drupal's database layout? Deployment pipeline could break and it's more extra work for you. What happens when you add a new plugin? More config tables, so changes in your deployment scripts are needed. More work for you. Should you eventually move on from this project, you would be expected to leave information with the new developer to explain what you've done with regards to these deployment issues. More work for you.

Consider what would happen if the config wasn't in the database but in your applications directory structure instead. Save the config changes to git/svn/etc, push the changes to your server box and overwrite older files. DONE. The database will be touched less when you roll out changes, your config can be put under version control, and your application is now directly coupled with the configuration it makes use of (which makes sense). This is more valuable for moderate/large scale applications or applications which make use of pre-built components/frameworks that you have no control over, then for small scale applications. However, it works at all scales were as storing config in databases becomes more troublesome as your applications grow and your deployment pipeline becomes complicated.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow