Scrubbing sensitive data

https://dba.stackexchange.com/questions/78693

10-12-2020
|

문제

I am looking for an automated solution to scrub sensitive data from my prod environment to my DEV and DEVINT environments so that I don't have to write lots of code to get this done. Does anyone know if Data Quality Services and a data cleansing step in SSIS can do the trick for me? Or, does anyone else have any suggestions to scrub my data without having to write TSQL to do it?

해결책

Scrubbing sensitive data is a vast topic. You basically need to define as per your company's data protection policy - what is considered PII (Personally Identifiable Information) or what is sensitive that you don't want other people to see ?

SQL Server 2012* does not have any native tools to mask or scrub sensitive data.

In my company, we have developed in-house tools that will mask client sensitive data like email, phone, names, and many more.

If you have to do it on a repeating basis, I would suggest you to take the schema from PROD and then use 3rd party tools like data generator from RedGate to generate test data.

There are several methods that will be useful to mask sensitive data (once you have identified - what is considered sensitive) like :

Brent has written a good blog post on How Do You Mask Data for Secure Testing?

* In SQL Server 2016, Microsoft has introduced Dynamic data Masking

It’s a data protection feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed. Dynamic data masking is easy to use with existing applications, since masking rules are applied in the query results.

다른 팁

You probably need to be more detailed in what you mean by "data scrubbing". I'm assuming that you mean taking a production data and randomising sensitive information (anything that identifies people or organisations such as names, codes, addresses, and so forth).

It is highly unlikely that you will find much by way of automated solutions for that, at least not that can deal with anything but the simplest of databases, because it is difficult to determine without specific application knowledge exactly what needs to be changed, what must be kept as-is otherwise the application will not make sense of it, and what must be changed but be kept in sync with other data (this is less of an issue if you have all your data in a well structured form as duplication will be minimal/zero).

Even if it were practical to write something generally useful enough to be worth while as a general tool covering simple fields, free text fields pose a significant problem as they may contain sensitive information and would require quite some AI to parse (and just blanking or randomising them might not be suitable).

I can envisage a tool that would help you talk some of the leg work out of creating data scrubbing scripts, but I'm not aware of any that exist.

It is usually easier, and particularly easier to convince an auditor if you clients require you to be visited by one to check out your handling of their data, to generate random/arbitrary test data perhaps still using reference to production databases but only for simple things like to make sure your test data has the same balance of data in each table. This way you know there is no sensitive information in there because there can't be: you never started with any and you know you didn't put any in. There are several tools out that to help with the process from this direction, I believe RedGate offer one in their chest of tools.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 dba.stackexchange