When is it OK to shrink a Database?

https://dba.stackexchange.com/questions/17277

22-10-2019
|

Pergunta

I know shrink is the devil: It reverses page order and is responsible for skin cancer, data fragmentation, and global warming. The list goes on... That being said, say I have a 100 GB database and I delete 50 GB of data -- not on one table, but a general pruning of old data on a database wide level, covering 90% of the tables -- does this constitute an appropriate use case for shrinking the database?

If not, what are the appropriate steps to take to clean house after removing such a high percentage of data from a database? I can think of two: Rebuild Indexes and Update Stats. What else?

Solução

A reorganise-and-shrink is never recommended really.

If you can take the apps the database is serving offline, you can speed up the process and reduce index fragmentation by removing all indexes and primary/foreign key constraints before the shrink (this will mean there is less data to be moved around as only the data pages will be shuffled not the now non-existent index pages, speeding up the process) then recreate all the indexes and keys.

Recreating the indexes after the shrink means they should not be significantly fragmented, and having them gone during the shrink means rebuilding them won't leave many small "holes" in the page allocation within the files that may invite fragmentation later.

Another option if you can offline the applications is to migrate all the data to a fresh database of the same structure. If your build process is solid you should be able to build that blank DB quickly, if not create one from the current DB (restore a backup of the current one, truncate/delete all the contents in the tables and perform a full shrink).

You might still want to drop all the indexes in the destination and recreate them afterwards as this can be a lot more efficient when changing a lot of the indexed data (100% of it in this case). To speed up the copy process, have the datafile(s) of the destination database on different physical drives to the source (unless you are using SSDs in which case you don't need to care about reducing head movements), you can move them to the source location when you are done.

Also, if creating the destination as new (rather than by blanking a copy of the source) create it with an initial size that will contain all the current data plus some months worth of growth - that will make the data copy a little faster again as it won't be allocating new space every now and again throughout the process.

This might be better than using shrink because migrating the data to a fresh database replicates the intended action of the shrink operation, but potentially with far less fragmentation (which is the unintended consequence of a reorganise-and-shrink). A shrink simply takes blocks from near the end of the file and puts them in the first space nearer the beginning making no effort to keep related data together.

I suspect the result will be more efficient space-wise too as there is likely to be less part-used pages afterwards. A shrink will just move part-used pages around, moving the data is more likely to result in full pages especially if you insert into the destination in the order of a table's clustered key/index (where a table has one) and create other indexes after the data has all migrated.

Of course if you can't take the apps offline at all, just performing a shrink is your only option so if you really need to reclaim the space go with that. Depending on your data, access patterns, common working set size, how much RAM the server has, and so forth, the extra internal fragmentation may not be all that significant in the end.

For the copy operation, either SSIS or base T-SQL would work just as well (the SSIS option might be less efficient, but potentially easier to maintain later). If you create the FK relationships at the end along with the indexes you can do a simple "for each table, copy" in either case. Of course for a one-off, a shrink+reorganise is probably fine too but I just like to scare people into never considering regular shrinks! (I've known people schedule them daily).

Outras dicas

Is the database going to grow again? If so then the effort you're going to put into the shrink operations are just going to be a waste, because when you've got the file size down and then you add more data, the file will just have to grow again, and transactions have to wait for that growth to happen. If you have sub-optimal auto-growth settings and/or a slow drive this growth activity is going to be quite painful.

If you do shrink the database, what are you going to use the freed up disk space for? Again if you are just going to keep that space free in case this database grows again, then you're just spinning your wheels.

What you might consider doing, now that you've got all this free space in the file, is rebuilding your indexes so that they are better optimized (and it will be much less painful to do this when you have free space to do so - think about trying to change a sweater in a tiny closet vs. a big bedroom).

So unless this was a major cleanup operation and you really won't be ramping up to the same level of data again, I'd just leave it as is and focus on other areas of optimization.

If you are running out of space, and your data isn't supposed to get that big then shrink, but rebuild your indices after with appropriate fill factors that allows for typical growth.

If your end goal is actually to reduce backup size make sure you implement a comprehensive backup strategy to clear out the transaction log and when you back up the db, use the compress options.

I wouldn't recommend auto growth of 5GB unless you typically will expect to grow 5GB frequently. You could have intermittent performance problems otherwise. Your data size should first be set to what you think is required for, say, a year, and Auto Growth should be set to a size that you've tested doesn't affect operating performance. See Don't Touch that Shrink Database Button In SQL Server! by Mike Walsh.

Rebuilding indexes before shrinking causes the indexes to be badly laid out. It's not good to rebuild then shrink. Shrinking causes the indexes to be mangled to recover space - so rebuilding beforehand then shrinking is pointless. See When to use Auto Shrink by Thomas LaRock.

I don't know if this would work better than reindexing after the shrink but another option would be to create a new data file that's appropriately sized and move all the data to that. In that case I would do a reindex first so you know what the actual data size is. One catch is that if this is the first file in the primary data file I don't think you can empty it. You should be able to shrink it then move the data back afterwards and that would avoid the page reversal. However, if you're looking at moving to solid state that shouldn't make a big difference anyway.

Coming back to this WAY late. Still, we've been pondering and testing the use of shrink in our testing environments for a long time as well. As per the topic, there are times when shrink is a viable option. But knowing when and how to apply it, are vital to proper execution both in the long and short term.

In our scenario, we've recently added numerous changes to our large DB including compression, partitioning, archiving and plain old deletion of redundant data. As a result, the used portion of our primary data file has dropped to less than half of what it used to be. But what's the point of carrying around all that luggage? Especially since contrary to some articles around the web, the size of your data files DIRECTLY CORRELATES WITH BACKUP / RESTORE DURATION. That's because unlike many articles assume, real life scenarios have loads more data on any given page than just the stuff you've maybe removed.

More to the point, this opens up a great scenario for shrinking:

Create a script that will find all objects and their filegroups in your database (plenty of examples online), use this to create the drop clauses as well as create definitions for every one of your indices and constraints.
Create a new file & filegroup, and make that the default.
Drop all nonclustered indices (note, some indices can be constraints).
Create your clustered indices on the new filegroup with DROP_EXISTING = ON (which, btw, is an immensely fast, minimally logged operation to begin with compared to many alternatives).
Recreate your nonclustered indices.
Finally, SHRINK your old data file (usually PRIMARY).

This way the only data left in there would be your DB's system objects, statistics, procedures and whatnot. The shrink should be much, MUCH faster, and there's no need for any further index maintenance on your main data objects which will have been created neatly in order and minimal risk for future fragmentation.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a dba.stackexchange