Finding integer database columns that may have their data type changed to reduce size
-
06-02-2021 - |
Question
I have a database schema that I've inherited where all the integer columns have been defined as int
but because of business rules many could be replaced by tinyiny
and other data types that would require less storage. While I don't expect much of a performance improvement by using the smaller types my reasons for wanting to do so are:
The application uses SQL Server Express so I'd like to make the most of the 10GB database size before needing to purge data.
The application is shipped with a full initial copy of the database and updates are synced using the Microsoft Sync Framework. Some client PCs are on relatively slow Internet connections so the smaller the database size the better.
I was wondering if there was some way to quickly identify columns that may be 'oversized'? I realize that the results will need to be carefully reviewed to make sure that the column will never exceed the limits of the new data type.
Solution
You might be able to save yourself a lot of work if you can upgrade to SQL Server 2016 SP1 or later. Data compression is available even in Express edition starting with that service pack. You could simply apply row compression to all tables and indexes to save even more space than what you can get from changing data types. That's because the data types that you mention only use bytes needed to fit the actual values with row compression. Screenshot from the documentation:
OTHER TIPS
The following stored procedure will help identify those columns. It starts by creating a temporary table that stores the minimum and maximum values that each integer type can hold and then runs a dynamic query to find the minimum and maximum value for every table / column that defines one of those integer types. Once the min/max values have been determined it looks up the smallest type that can hold the value current values and gives a result such as the following:
table_name column_name current_type min_val max_val proposed_type proposed_min proposed_max space_saved
RideLegVetting RideNumber int 1 21 tinyint 0 255 2025654
RideLegVetting LegNumber int 0 99 tinyint 0 255 2025654
For each column it shows the curent type and min/max values found in the table along with a new proposed type and the min/max value it can hold. Based on the cardinality it also indicates an estimate of the amount of space that will be saved. The stored procecedure is as follows:
CREATE PROCEDURE OptimizeIntSizes AS
BEGIN
SET NOCOUNT ON
CREATE TABLE #DataTypeRanges
(
name nvarchar(128),
size tinyint,
low bigint,
high bigint
)
INSERT INTO #DataTypeRanges VALUES (N'tinyint', 1, 0, 255)
INSERT INTO #DataTypeRanges VALUES (N'smallint', 2, -32768, 32767)
INSERT INTO #DataTypeRanges VALUES (N'int', 4, -2147483648, 2147483647)
INSERT INTO #DataTypeRanges VALUES (N'bigint', 8, -9223372036854775808, 9223372036854775807)
CREATE TABLE #Results
(
table_name nvarchar(128),
column_name nvarchar(128),
current_type nvarchar(128),
min_val bigint,
max_val bigint,
proposed_type nvarchar(128),
proposed_min bigint,
proposed_max bigint,
space_saved bigint
)
DECLARE @table_name nvarchar(128)
DECLARE @column_name nvarchar(128)
DECLARE @current_type nvarchar(128)
DECLARE @proposed_type nvarchar(128)
DECLARE @low bigint
DECLARE @high bigint
DECLARE @size tinyint
DECLARE @cardinality bigint
DECLARE @min_val bigint
DECLARE @max_val bigint
DECLARE @proposed_min bigint
DECLARE @proposed_max bigint
DECLARE @proposed_size tinyint
DECLARE @space_saved bigint
DECLARE @sql nvarchar(max)
DECLARE @params nvarchar(max)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DECLARE FindCursor CURSOR STATIC FOR
SELECT tables.name AS table_name, cols.name AS column_name, types.name, dtr.size, dtr.low, dtr.high
FROM sys.columns cols
JOIN sys.tables tables ON tables.object_id = cols.object_id
JOIN sys.types types ON types.system_type_id = cols.system_type_id
JOIN #DataTypeRanges dtr ON dtr.name COLLATE DATABASE_DEFAULT = types.name COLLATE DATABASE_DEFAULT
OPEN FindCursor
FETCH FindCursor INTO @table_name, @column_name, @current_type, @size, @low, @high
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = N'SELECT @cardinality_out = COUNT(1), @min_val_out = MIN([' + @column_name + ']), @max_val_out = MAX([' + @column_name + ']) FROM [' + @table_name + ']'
SET @params = N'@cardinality_out bigint OUTPUT, @min_val_out bigint OUTPUT, @max_val_out bigint OUTPUT'
EXECUTE sp_executesql @sql, @params, @cardinality_out = @cardinality OUTPUT, @min_val_out = @min_val OUTPUT, @max_val_out = @max_val OUTPUT
SELECT TOP 1 @proposed_type = name, @proposed_size = size, @proposed_min = low, @proposed_max = high
FROM #DataTypeRanges
WHERE @min_val >= low AND @max_val <= high
ORDER BY size
IF @proposed_type <> @current_type AND @max_val IS NOT NULL
BEGIN
SET @space_saved = (@size - @proposed_size) * @cardinality
INSERT INTO #Results VALUES (@table_name, @column_name, @current_type, @min_val, @max_val, @proposed_type, @proposed_min, @proposed_max, @space_saved)
END
FETCH FindCursor INTO @table_name, @column_name, @current_type, @size, @low, @high
END
CLOSE FindCursor
DEALLOCATE FindCursor
SELECT * FROM #Results ORDER BY space_saved DESC
END