In what cases are BLOB and TEXT stored in-line on InnoDB?

https://dba.stackexchange.com/questions/210408

04-01-2021
|

Question

I've googled this and can't find it, you can see Rick James mention it here.

... BLOB and TEXT are not always stored separately

The answer on What is the difference between MySQL VARCHAR and TEXT data types? says,

InnoDB is similar for VARCHAR, but stores the complete TEXT field outside of the record.

That's not the only massively upvote answer with this stance. From StackOverflow you can find an answer with 754 upvotes that says it quite clearly,

TEXT and BLOB is stored off the table with the table just having a pointer to the location of the actual storage.

VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a realworld scenario with your data.

The comment here links to a post on a mailing list that says

MyISAM puts TEXT and BLOB 'inline'. If you are searching a table (range scan / table scan), you are 'stepping over those cow paddies' -- costly for disk I/O. That is, the existence of the inline blob hurts performance in this case.

InnoDB puts only 767 bytes of a TEXT or BLOB inline, the rest goes into some other block. This is a compromise that sometimes helps, sometimes hurts performance.

I've seen other posts that talk about storing text and blobs in-line if they're under some limit? What is the limit? When, if ever, does InnoDB store blobs and texts in line?

Solution

For InnoDB the rule goes like this:

A record can not be fragmented.
At least two records must fit in a page.
If a record size is less than ~7k, all field values are stored in-page.
If the record size is more than ~7k the first 768 (COMPACT format) or 20 bytes (DYNAMIC format) are stored in-page. The remaining part is stored off-page.

How to verify that.

CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1


mysql> insert into t1(name) select repeat('a', 10);
Query OK, 1 row affected (0.02 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t1(name) select repeat('a', 8000);
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

Then parse the t1.ibd with stream_parser.

stream_parser -f t1.ibd

Check content of the PRIMARY index page:

# hexdump -C pages-t1.ibd/FIL_PAGE_INDEX/0000000000000046.page
00000000  97 e5 5e 36 00 00 00 03  ff ff ff ff ff ff ff ff  |..^6............|
00000010  00 00 00 00 00 27 b9 44  45 bf 00 00 00 00 00 00  |.....'.DE.......|
00000020  00 00 00 00 00 18 00 02  1f f3 80 04 00 00 00 00  |................|
00000030  00 a2 00 02 00 01 00 02  00 00 00 00 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 2e 00 00 00 18 00 00  |................|
00000050  00 02 00 f2 00 00 00 18  00 00 00 02 00 32 01 00  |.............2..|
00000060  02 00 1c 69 6e 66 69 6d  75 6d 00 03 00 0b 00 00  |...infimum......|
00000070  73 75 70 72 65 6d 75 6d  0a 00 00 00 10 00 23 80  |supremum......#.|
00000080  00 00 01 00 00 00 00 05  10 ae 00 00 01 22 01 10  |............."..|
00000090  61 61 61 61 61 61 61 61  61 61 40 9f 00 00 00 18  |aaaaaaaaaa@.....| <- this is the first record
000000a0  ff ce 80 00 00 02 00 00  00 00 05 11 af 00 00 01  |................|
000000b0  23 01 10 61 61 61 61 61  61 61 61 61 61 61 61 61  |#..aaaaaaaaaaaaa| <- this is the beginning of the second record
000000c0  61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61  |aaaaaaaaaaaaaaaa|
* <- The star means identical strings are omitted, -v will show full output.
00001ff0  61 61 61 00 00 00 00 00  00 00 00 00 00 00 00 00  |aaa.............|
00002000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00003ff0  00 00 00 00 00 70 00 63  97 e5 5e 36 00 27 b9 44  |.....p.c..^6.'.D|
00004000  97 e5 5e 36 00 00 00 03  ff ff ff ff ff ff ff ff  |..^6............|
00004010  00 00 00 00 00 27 b9 44  45 bf 00 00 00 00 00 00  |.....'.DE.......|
00004020  00 00 00 00 00 18 00 02  1f f3 80 04 00 00 00 00  |................|
00004030  00 a2 00 02 00 01 00 02  00 00 00 00 00 00 00 00  |................|
00004040  00 00 00 00 00 00 00 00  00 2e 00 00 00 18 00 00  |................|
00004050  00 02 00 f2 00 00 00 18  00 00 00 02 00 32 01 00  |.............2..|
00004060  02 00 1c 69 6e 66 69 6d  75 6d 00 03 00 0b 00 00  |...infimum......|
00004070  73 75 70 72 65 6d 75 6d  0a 00 00 00 10 00 23 80  |supremum......#.|
00004080  00 00 01 00 00 00 00 05  10 ae 00 00 01 22 01 10  |............."..|
00004090  61 61 61 61 61 61 61 61  61 61 40 9f 00 00 00 18  |aaaaaaaaaa@.....|
000040a0  ff ce 80 00 00 02 00 00  00 00 05 11 af 00 00 01  |................|
000040b0  23 01 10 61 61 61 61 61  61 61 61 61 61 61 61 61  |#..aaaaaaaaaaaaa|
000040c0  61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61  |aaaaaaaaaaaaaaaa|
*
00005ff0  61 61 61 00 00 00 00 00  00 00 00 00 00 00 00 00  |aaa.............|
00006000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00007ff0  00 00 00 00 00 70 00 63  97 e5 5e 36 00 27 b9 44  |.....p.c..^6.'.D|
00008000

This is for COMPACT format. For DYNAMIC (except COMPRESSED) the procedure is same.

OTHER TIPS

Keep in mind that there are 4 "row formats". A main difference between then has to do with how wide columns are handled.

The reference points to an Answer written early in 2010, a few months before DYNAMIC and COMPRESSED were introduced in the "InnoDB plugin".

So, I claim that that other Q&A is out of date! That is the 754 Upvotes are no longer valid.

Here are some 'current' manual references:

(emphasis mine)

From https://dev.mysql.com/doc/refman/5.6/en/innodb-row-format-overview.html :

Columns such as BLOB and VARCHAR that are too long to fit on a B-tree page are stored on separately allocated disk pages called overflow pages.

From https://dev.mysql.com/doc/refman/5.6/en/innodb-row-format-dynamic.html :

When a table is created with ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED, InnoDB can store long variable-length column values (for VARCHAR, VARBINARY, and BLOB and TEXT types) fully off-page, with the clustered index record containing only a 20-byte pointer to the overflow page.

[DYNAMIC and COMPRESSED] When the row is too long, InnoDB chooses the longest columns for off-page storage until the clustered index record fits on the B-tree page. TEXT and BLOB columns that are less than or equal to 40 bytes are always stored in-line.

The DYNAMIC format is based on the idea that if a portion of a long data value is stored off-page, it is usually most efficient to store all of the value off-page. With DYNAMIC format, shorter columns are likely to remain in the B-tree node, minimizing the number of overflow pages needed for any given row.

The COMPRESSED row format uses similar internal details for off-page storage as the DYNAMIC row format

From https://dev.mysql.com/doc/refman/5.6/en/innodb-physical-record.html (even a big CHAR may be treated like TEXT):

InnoDB [all formats?] encodes fixed-length fields greater than or equal to 768 bytes in length as variable-length fields, which can be stored off-page. For example, a CHAR(255) column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4.

ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED handle CHAR storage in the same way as ROW_FORMAT=COMPACT.

(The links are for 5.6, but I think the text applies at least to newer versions.)

I see nothing specific that says that VARCHAR(...) and *TEXT are handled differently.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange