datawarehouse data security?

https://stackoverflow.com/questions/21088724

27-09-2022
|

Domanda

I started at a company as a junior sql developer on a datawarehouse. Ever since I have been going through the code and learning the dimensional models etc. I struggle to see security measures outside of rights that the developer has on the environment.

but if someone would to write code that influences the data in the warehouse in a significant way, update to the wrong values, insert false data, delete records that should be there and hits that code with a commit statement, wouldn't there be a massive impact on the business intelligence aspect of the warehouse? Like if they were to pull data to create statistics and there is bad data, then they will have bad statistics.

We have about 7 billlion records and changes made in this way would be really hard to pick up if it can be seen at all.

Maybe this is a simple question, but I can't really find an answer, since in the datawarehouse you don't have the rigorous relational constraints to check data validity, especially when you move around big data and the database administrators drop the triggers and indexes as well. The transactional side we get the source data from also doesn't keep history (that's our job).

Any views and suggestions on this subject will be highly appreciated, thank you.

Soluzione

When working with databases or writing code in general, mistakes happen. That is why you ALWAYS separate your development environment from your production environment. Most of us also have an intermediate test environment, where new code is tested and data is validated, before the code is deployed to production.

Furthermore, before any deployment, a full backup is taken. That way, if an error is discovered after deployment, a restore of the backup can be made.

Preferably, your development and production environments run on separate, but identical, servers. If that is not possible, at least keep the data in separate databases, and use the security of your database server, to ensure that no one can make changes to the production database, unless a deployment is happening.

Now for the deployment itself, make sure you have a sort of checklist to go over, every time you make a deployment. First step on the checklist should be to backup the existing production environment. Write scripts to automate parts of the deployment, whenever possible. Use tools such as SQL Schema Compare, to identify differences between the development and production database, etc. Ideally, deployment should be a matter of pressing one button, and then everything deploys automagically, and you can go back to developing without worrying.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow