Calculating disk usage with INFORMATION_SCHEMA.TABLE versus file system?
-
29-01-2021 - |
题
MySQL's database size can be calculated with
SELECT table_schema AS db_name, SUM(data_length + index_length) AS size
FROM INFORMATION_SCHEMA.TABLES
WHERE table_schema != 'INFORMATION_SCHEMA'
GROUP BY db_name
But you can also calculate the database size by looking at the disk-usage of the data-directory. Should these two numbers ever deviate? Is there ever a reason to use one of these methods of the other? By extension, how is INFORMATION_SCHEMA.TABLES
's data_length
and index_length
calculated?
解决方案
Well, it seems like there are two ways the deviate in performance,
Statistics when you query with
innodb_stats_on_metadata
(the default) you force the statistics to be updated (if it's not cached). You can turn this off, but doing so requires you make aGLOBAL
change. From the docs:When
innodb_stats_on_metadata
is enabled, InnoDB updates non-persistent statistics when metadata statements such asSHOW TABLE STATUS
or when accessing theINFORMATION_SCHEMA.TABLES
orINFORMATION_SCHEMA.STATISTICS
tables. (These updates are similar to what happens forANALYZE TABLE
.) When disabled, InnoDB does not update statistics during these operations. Leaving the setting disabled can improve access speed for schemas that have a large number of tables or indexes. It can also improve the stability of execution plans for queries that involve InnoDB tables.To change the setting, issue the statement
SET GLOBAL innodb_stats_on_metadata=mode
, where mode is eitherON
orOFF
(or1
or0
). Changing the setting requires privileges sufficient to set global system variables (see Section 5.1.9.1, “System Variable Privileges”) and immediately affects the operation of all connections.Cache else it seems that the
INFORMATION_SCHEMA
is reading from a cache inTable_statistics::get_stat
. That seems to be documented underinformation_schema_stats_expiry
. But it also seems you can refresh the cache withANALYZE
table (which the docs mention), and the testing suite. So the cache can drift by that many seconds.
I don't see anything else documented or in the code that would leave me to believe there was a difference.