Question

I have a large database (90GB data, 70GB indexes) that's been slowly growing for the past year, and the growth/changes has caused a large amount of internal fragmentation not only of the indexes, but of the tables themselves.

It's easy to resolve the (large number of) very fragmented indexes - a REORGANIZE or REBUILD will take care of that, depending on how fragmented they are - but the only advice I can find on cleaning up actual table fragmentation is to add a clustered index to the table. I'd immediately drop it afterwards, as I don't want a clustered index on the table going forward, but is there another method of doing this without the clustered index? A "DBCC" command that will do this?

Thanks for your help.

Was it helpful?

Solution

Problem

Let's get some clarity, because this is a common problem, a serious issue for every company using SQL Server.

This problem, and the need for CREATE CLUSTERED INDEX, is misunderstood.

Agreed that having a permanent Clustered Index is better than not having one. But that is not the point, and it will lead into a long discussion anyway, so let's set that aside and focus on the posted question.

The point is, you have substantial fragmentation on the Heap. You keep calling it a "table", but there is no such thing at the physical data storage or DataStructure level. A table is a logical concept, not a physical one. It is a collection of physical DataStructures. The collection is one of two possibilities:

  • Heap
    plus all Non-clustered Indices
    plus Text/Image chains

  • or a Clustered Index
    (eliminates the Heap and one Non-clustered Index)
    plus all Non-clustered Indices
    plus Text/Image chains.

Heaps get badly fragmented; the more interspersed (random)Insert/Deletes/Updates there are, the more fragmentation.

There is no way to clean up the Heap, as is. MS does not provide a facility (other vendors do).

Solution

However, we know that Create Clustered Index rewrites and re-orders the Heap, completely. The method (not a trick), therefore, is to Create Clustered Index only for the purpose of de-fragmenting the Heap, and drop it afterward. You need free space in the db of table_size x 1.25.

While you are at it, by all means, use FILLFACTOR, to reduce future fragmentation. The Heap will then take more allocated space, allowing for future Inserts, Deletes and row expansions due to Updates.

Note

  1. Note that there are three Levels of Fragmentation; this deals with Level III only, fragmentation within the Heap, which is caused by Lack of a Clustered Index

  2. As a separate task, at some other time, you may wish to contemplate the implementation of a permanent Clustered Index, which eliminates fragmentation altogether ... but that is separate to the posted problem.

Response to Comment

SqlRyan:
While this doesn't give me a magic solution to my problem, it makes pretty clear that my problem is a result of a SQL Server limitation and adding a clustered index is the only way to "defragment" the heap.

Not quite. I wouldn't call it a "limitation".

  1. The method I have given to eliminate the Fragmentation in the Heap is to create a Clustered Index, and then drop it. Ie. temporarily, the only purpose of which is correct the Fragmentation.

  2. Implementing a Clustered Index on the table (permanently) is a much better solution, because it reduces overall Fragmentation (the DataStructure can still get Fragmented, refer detailed info in links below), which is far less than the Fragmentation that occurs in a Heap.

    • Every table in a Relational database (except "pipe" or "queue" tables) should have a Clustered Index, in order to take advantage of its various benefits.

    • The Clustered Index should be on columns that distribute the data (avoiding INSERT conflicts), never be indexed on a monotonically increasing column, such as Record ID 1, which guarantees an INSERT Hot Spot in the last Page.

1. Record IDs on every File renders your "database" a non-relational Record Filing System, using SQL merely for convenience. Such Files have none of the Integrity, Power, or Speed of Relational databases.

Andrew Hill:
would you be able to comment further on "Note that there are three Levels of Fragmentation; this deals with Level III only" -- what are the other two levels of fragmentation?

In MS SQL and Sybase ASE, there are three Levels of Fragmentation, and within each Level, several different Types. Keep in mind that when dealing with Fragmentation, we must focus on DataStructures, not on tables (a table is a collection of DataStructures, as explained above). The Levels are:

  • Level I • Extra-DataStructure
    Outside the DataStructure concerned, across or within the database.

  • Level II • DataStructure
    Within the DataStructure concerned, above Pages (across all Pages)
    This is the Level most frequently addressed by DBAs.

  • Level III • Page
    Within the DataStructure concerned, within the Pages

These links provide full detail re Fragmentation. They are specific to Sybase ASE, however, at the structural level, the information applies to MS SQL.

Note that the method I have given is Level II, it corrects the Level II and III Fragmentation.

OTHER TIPS

You state that you add a clustered index to alleviate the table fragmentation, to then drop it immediately.

The clustered index removes fragmentation by sorting on the cluster key, but you say that this key would not be possible for future use. This begs the question: why defragment using this key at all?

It would make sense to create this clustered key and keep it, as you obviously want/need the data sorted that way. You say that data changes would incur data movement penalties that can't be borne; have you thought about creating the index with a lower FILLFACTOR than the default value? Depending upon data change patterns, you could benefit from something as low as 80%. You then have 20% 'unused' space per page, but the benefit of lower page splits when the clustered key values are changed.

Could that help you?

You can maybe compact the heap by running DBCC SHRINKFILE with NOTRUNCATE.

Based on comments, I see you haven't tested with a permenent clustered index.

To put this in perspective, we have database with 10 million new rows per day with clustered indexes on all tables. Deleted "gaps" will be removed via scheduled ALTER INDEX (and also forward pointers/page splits).

Your 12GB table may be 2GB after indexing: it merely has 12GB allocated but is massively fragmented too.

I understand your pain in being constrained by the design of a legacy design.

Have you the oppertunity to restore a backup of the table in question on another server and create a clustered index? It is very possible the clustered index if created on a set of narrow unique columns or an identity column will reduce the total table (data and index) size.

In one of my legacy apps all the data was accessed via views. I was able to modify the schema of the underlying table adding an identity column and a clustered index without effecting the application.

Another drawback of having the heap is the extra IO associated with any fowarded rows.

I found the article below effective when I was asked if there was any PROOF that we needed a clusted index permanently on the table

This article is by Microsoft

The problem that no one is talking about is FRAGMENTATION OF THE DATA OR LOG DEVICE FILES ON THE HARD DRIVE(s) ITSELF!! Everyone talks about fragmentation of the indexes and how to avoid/limit that fragmentation.

FYI: When you create a database, you specify the INITIAL size of the .MDF along with how much it will grow by when it needs to grow. You do the same with the .LDF file. THERE IS NO GUARANTEE THAT WHEN THESE TWO FILES GROW THAT THE DISK SPACE ALLOCATED FOR THE EXTRA DISK SPACE NEEDED WILL BE PHYSICALLY CONTIGUOUS WITH THE EXISTING DISK SPACE ALLOCATED!!

Every time one of these two device files needs to expand, there is the possibility of fragmentation of the hard drive disk space. That means the heads on the hard drive need to work harder (and take more time) to move from one section of the hard drive to another section to access the necessary data in the database. It is analogous to buying a small plot of land and building a house that just fits on that land. When you need to expand the house, you have no more land available unless you buy the empty lot next door - except - what if someone else, in the meantime, has already bought that land and built a house on it? Then you CANNOT expand your house. The only possibility is to buy another plot of land in the "neighborhood" and build another house on it. The problem becomes - you and two of your children would live in House A and your wife and third child would live in House B. That would be a pain (as long as you were still married).

The solution to remedy this situation is to "buy a much larger plot of land, pick up the existing house (i.e. database), move it to the larger plot of land and then expand the house there". Well - how do you do that with a database? Do a full backup, drop the database (unless you have plenty of free disk space to keep both the old fragmented database - just in case - as well as the new database), create a brand new database with plenty of initial disk space allocated (no guarantee that the operating system will insure that the space that you request will be contiguous) and then restore the database into the new database space just created. Yes - it is a pain to do but I do not know of any "automatic disk defragmenter" software that will work on SQL database files.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top