How to predict database size?

https://dba.stackexchange.com/questions/281927

12-03-2021
|

Pregunta

One of requirments for database which I am building is to predict database size to prepare hardware for production environment. Aapplication has 2 main tables wchich are partitioned. Tables are in 8 Filegroups (16 files, 32 partitions). Both tables store data for last month (after one month data will be deleted). We have to be prepared for around 12 milions rows daily in one table and around 36 milions rows in second table. I made a workload test for these tables and:

For 1 milion rows data size of filegroups which cointains data was around 13 GB
For 12 milion rows data size of filegroups which contains data was around 48,5 GB

Log size has increased only by 40mb.

According to this data have two questions:

My idea was to make daily workload and multiple it by 30. But according to above data it is not working like that. 13*12 != 48,5
Why log has increased only by 40 mb of data ?
Is there any difference in size If we store data in AlwaysOn solution ?

To measure file size I have been used below query:

SELECT [sizing].[DbName],
       [sizing].[FileName],
       [sizing].[type_desc],
       [sizing].[CurrentSizeMB],
       [sizing].[FreeSpaceMB],
       [CurrentSizeMB] - [FreeSpaceMB] AS [SizeStored]
FROM
(
    SELECT DB_NAME() AS [DbName],
           [name] AS [FileName],
           [type_desc],
           [size] / 128.0 AS [CurrentSizeMB],
           [size] / 128.0 - CAST(FILEPROPERTY([name], 'SpaceUsed') AS int) / 128.0 AS [FreeSpaceMB]
    FROM [sys].[database_files]
    WHERE [type] IN ( 0, 1 )
) [sizing];

Solución

It sounds like you have highly varying data, for example a few VARCHAR(MAX) columns that are sometimes filled in heavily and other times not at all. That is why 12,000,000 rows doesn't necessarily result in 12 times the size of 1 million rows. You need to use a larger sample size to more accurately determine what your data growth will be. For example if you wanted to know what the smaller table will look like in a month then you need to measure against at least a months worth of data (360,000,000 rows). And honestly probably should measure against a few months for a more accurate estimation if possible (though I'm assuming it's not since you're trying to do initial provisioning).
Without more information it's hard to tell why your log file is so relatively slow. What Recovery Model is your database set to? Is it possible someone ran a SHRINK operation against it?
The AlwaysOn solution won't materially affect the size on your Primary Replica. Though keep in mind that AlwaysOn results in a literal copy of your database in a Secondary Replica on a separate server, so you'll be storing two copies of the same data.

To answer your root question, the advice I gave in #1 in regards to using as big of a data sample for your calculations as possible and adding in a worst case estimate buffer is your best bet. E.g. if you calculate a months worth of data to be 1 TB, provision for 1.25 or 1.5 TB to be safe for the first month and adjust as needed over time. This will be a continuous task for you to revisit periodically, re-calculate, re-evaluate, and re-provision in the beginning until you become more intimate with your data. It's better to be safe over provisioning in the beginning than under provisioning.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a dba.stackexchange