SELECTs in Database Change Log and Source Control

https://dba.stackexchange.com/questions/83760

12-12-2020
|

Question

We are overhauling the way that we store our database in source control and keep a change log of it. I was reading the following article: http://thedailywtf.com/articles/Database-Changes-Done-Right, and in the Short Section of "The Taxonomy of Database Scripts" it describes the three types of scripts (QUERY, OBJECT, and CHANGE). I like the idea of generalizing scripts into these three categories but I'm wondering about the QUERY type. Questions:

Why would someone want to put a SELECT statement into source control outside of an object?
The database will change afterwards and make the QUERY script unusable, what then?
The data may change returning a different result set, this would defeat purpose of source control, what then?
Would the original result set have to be saved to solve the 2nd and 3rd issues?
What is an example of a SELECT statement that might be put into source control?
Wouldn't a INSERT statement work better and store the results in a table as in baselines?

I just can't see the purpose of storing SELECT statements into source control.

If there is a purpose could someone please answer the above questions and maybe state the pros and cons of storing a SELECT statement into source control?

Solution

Well, in the article you reference, he specifically states "The first category of scripts fall out of the realm of database changes." That said, I've worked with plenty of legacy codebases that had select statements in source control; they were just in the application code. (So, subject to all the problems you mention and hidden from the DBA.)

Before answering your question about selects specifically, It seems important to mention that the whole point of having a source control system is to be able to trace back where things went wrong. That's why people might choose one over, say, an archive of daily backups. In your second bullet point, you mention that your select statement might become invalid, but if you have a version control system you would be able to check out a previous version where the select statement still matched the schema. Once you figured out what it was intended to do, you could check in a corrected version of that statement at a later version.

I guess I've now answered your first through third points. It's unlikely (though entirely possible) that a select statement sitting in it's own sql file would be used by some other piece of software, but consider that a select statement sitting inside a stored procedure is equally able to become out-of-synch with schema. Expanding on point 3, it's precisely because we recognize that this is a problem that we have source control, it does not defeat the purpose of SC, dealing with that problem is the purpose.

I can't really think of a case where storing the results would be helpful, but if you were worried you'd be unable to interpret it in the future, perhaps a description in a comment would be useful?

If you have select statements that are consumed by application code, powershell scripts, etl, reports, or possibly even by the ops team for routine tasks, these all seem like good candidates for being considered a part of your codebase. Remember, the purpose of the source control is not as a backup strategy or as logging, it is to be able to bring up an internally consistent version of your code at a particular point in time. That is why the answer to your last point is no. this is about the code and the schema, not so much about your data. The article that prompted this does a good job of explaining why it's challenging to put anything but purely reference data in your source control system.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange