Question

While estimating straight row and table sizes are fairly simple math, we find it challenging to guess just how much space each index will occupy (for a given table size). What areas can we learn to calculate a better estimate and growth rate for indexes?

Was it helpful?

Solution

An index leaf has a preamble identifying the data page (7 bytes plus some directory information for variable length columns, if any) plus a copy of the key value (s) which will be the same size as the table data for those columns. There's one for each row in the table. The higher up levels of the index are much smaller, usually less than 1% of the leaves unless you are indexing a very wide key.

The fill factor leaves some space free so that updates and inserts do not generate excessive leaf splitting traffic.

EDIT: This MSDN link describes the page-level structures, although it's a bit light on the format of the individual index rows. This presentation goes into the physical format of disk log entries and data pages to some extent. This one more detail and includes the index data structures. Numeric and fixed length columns have the size it says on the box; you would have to estimate the average size of varchar columns.

For reference, some documents on Oracle's block format can be found Here and Here.

OTHER TIPS

when possible, I generally take 1000 records from the original table, insert them into my own table, and with the script below I have a sample to play with.

Ok it is not accurate, but can give me a starting point.

--Find out the disk size of an index:
--USE [DB NAME HERE]
go
SELECT
OBJECT_NAME(I.OBJECT_ID) AS TableName,
I.name AS IndexName,   
8 * SUM(AU.used_pages) AS 'Index size (KB)',
CAST(8 * SUM(AU.used_pages) / 1024.0 AS DECIMAL(18,2)) AS 'Index size (MB)'
FROM
sys.indexes I
JOIN sys.partitions P ON P.OBJECT_ID = I.OBJECT_ID AND P.index_id = I.index_id
JOIN sys.allocation_units AU ON AU.container_id = P.partition_id
--WHERE 
--    OBJECT_NAME(I.OBJECT_ID) = '<TableName>'    
GROUP BY
I.OBJECT_ID,    
I.name
ORDER BY
TableName

--========================================================================================

--http://msdn.microsoft.com/en-us/library/fooec9de780-68fd-4551-b70b-2d3ab3709b3e.aspx

--I believe that keeping the GROUP BY 
--is the best option in this case
--because of sys.allocation_units
--can have 4 types of data inside
--as below:

--type tinyint
--Type of allocation unit.
--0 = Dropped
--1 = In-row data (all data types, except LOB data types)
--2 = Large object (LOB) data (text, ntext, image, xml, large value types, and CLR     user-defined types)
--3 = Row-overflow data

--marcelo miorelli 8-NOV-2013
--========================================================================================
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top