Question

On my MySQL DB I was thinking of using TINYINT (Unsigned)?

Would you use Byte on SQL Server?

Was it helpful?

Solution

This answer only covers SQL Server. The answer depends on how you define efficient: is it space or CPU? There can be a tradeoff between the two.

Let's start by checking the documentation for data types that store integer data:

╔═══════════╦══════════════════════════════════════════════════════════════════════════╦═════════╗
║ Data type ║                                  Range                                   ║ Storage ║
╠═══════════╬══════════════════════════════════════════════════════════════════════════╬═════════╣
║ bigint    ║ -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807) ║ 8 Bytes ║
║ int       ║ -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)                         ║ 4 Bytes ║
║ smallint  ║ -2^15 (-32,768) to 2^15-1 (32,767)                                       ║ 2 Bytes ║
║ tinyint   ║ 0 to 255                                                                 ║ 1 Byte  ║
╚═══════════╩══════════════════════════════════════════════════════════════════════════╩═════════╝

For your data you could use TINYINT because all of your data fits within 0 to 255 and that would use the least amount of space. Let's do a quick test by inserting 10 million rows into tables with values equally distributed between 0 and 100. Note that we use ten columns for all example tables because a rowstore table has a minimum row size of 9 bytes. If we created a table with just one column we would get misleading results. I am testing against SQL Server 2016 SP1:

DROP TABLE IF EXISTS dbo.X_TINYINT;

CREATE TABLE dbo.X_TINYINT (
    NUM1 TINYINT NOT NULL,
    NUM2 TINYINT NOT NULL,
    NUM3 TINYINT NOT NULL,
    NUM4 TINYINT NOT NULL,
    NUM5 TINYINT NOT NULL,
    NUM6 TINYINT NOT NULL,
    NUM7 TINYINT NOT NULL,
    NUM8 TINYINT NOT NULL,
    NUM9 TINYINT NOT NULL,
    NUM10 TINYINT NOT NULL
);

INSERT INTO dbo.X_TINYINT WITH (TABLOCK)
SELECT TOP (10000000) 
  n.n, n.n, n.n, n.n, n.n
, n.n, n.n, n.n, n.n, n.n
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
CROSS APPLY
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 101
) n (n);

EXEC sp_spaceused 'dbo.X_TINYINT'; -- data size is 198216 KB

If I run the same code against a table with 10 SMALLINT columns I get 297416 KB for the data size. Based on the quoted size difference in data types I would expect a difference of about 10000000 * 10 * (2 - 1) / 1024 = 97656 KB so that's pretty close to the expected size increase.

Depending on your version and edition of SQL Server you may be able to reduce space usage further. Row compression is first available in SQL Server 2008 Enterprise and is available in all editions of SQL Server as of 2016 SP1. Based on the description of the algorithm we probably won't get a lot of savings with row compression for TINYINT columns. They already use the minimum one byte, but we should get around a 1% space reduction for the data because 0 is optimized to take no bytes. There also may be some reduction in the metadata overhead for the data types.

After applying DATA_COMPRESSION = ROW to a table I get a data size of 187808 KB.

Page compression is available in the same versions and editions as row compression. The page compression algorithm compresses the data in a few additional ways on top of row compression. With a a lot of repeated values on a page we might see significant storage gains.

After applying DATA_COMPRESSION = PAGE to a table I get a data size of 109024 KB, which is a fairly sizable reduction.

Just for fun we can check the space usage when the data is in columnstore format. Columnstore indexes were introduced in SQL Server 2012 and further improved in 2014 and 2016. These should not be used just for the purposes of saving space. You will want to research and test carefully before using them. There are also some restrictions around using them based on your SQL Server version and edition.

DROP TABLE IF EXISTS dbo.X_TINYINT_CCI;

CREATE TABLE dbo.X_TINYINT_CCI (
    NUM1 TINYINT NOT NULL,
    NUM2 TINYINT NOT NULL,
    NUM3 TINYINT NOT NULL,
    NUM4 TINYINT NOT NULL,
    NUM5 TINYINT NOT NULL,
    NUM6 TINYINT NOT NULL,
    NUM7 TINYINT NOT NULL,
    NUM8 TINYINT NOT NULL,
    NUM9 TINYINT NOT NULL,
    NUM10 TINYINT NOT NULL
);

CREATE CLUSTERED COLUMNSTORE INDEX CCI_X_TINYINT_CCI ON dbo.X_TINYINT_CCI;

INSERT INTO dbo.X_TINYINT_CCI WITH (TABLOCK)
SELECT TOP (10000000) 
  n.n, n.n, n.n, n.n, n.n
, n.n, n.n, n.n, n.n, n.n
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
CROSS APPLY
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 101
) n (n)
OPTION (MAXDOP 1);

EXEC sp_spaceused 'dbo.X_TINYINT_CCI'; -- data size is 160 KB

I can also compress the CCI using COLUMNSTORE_ARCHIVE, which I think is designed for historical data which won't change or be read often. With that compression option applied to the whole table the data size is further reduced to 88 KB.

The rowstore compression options add CPU overhead to queries. The amount of overhead will depend on your workload and data but we can use a simple query to illustrate the basic concept:

SELECT MAX(NUM1), MIN(NUM1)
FROM dbo.X_TINYINT
OPTION (MAXDOP 1);

After one test I got CPU time measurements of 1469 ms for uncompressed data, 1687 ms for data with row compression, and 2000 ms for data with page compression. I didn't do tests for the columnstore data just because they work so differently. MIN and MAX queries can be locally aggregated or even satisfied by metadata operations in some cases.

Here is a summary of the results for the test table and query:

╔═══════════════════╦══════════════════╦══════════════════════╗
║ Table Compression ║ Data Space in KB ║ Query CPU Time in ms ║
╠═══════════════════╬══════════════════╬══════════════════════╣
║ NONE              ║           198216 ║ 1469                 ║
║ ROW               ║           187808 ║ 1687                 ║
║ PAGE              ║           109024 ║ 2000                 ║
║ CCI               ║              160 ║ N/A                  ║
║ CCI ARCHIVE       ║               88 ║ N/A                  ║
╚═══════════════════╩══════════════════╩══════════════════════╝

The exact results that you will see depend on your table structure, data, and workload.

OTHER TIPS

tinyint, byte isn't a data type, more an expression of storage. Tinyint is stored as a single byte.

https://technet.microsoft.com/en-us/library/ms172424(v=sql.110).aspx

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top