Does MySQL update the index on all inserts? Can I make it update after every x inserts?

StackOverflow https://stackoverflow.com/questions/23435361

  •  14-07-2023
  •  | 
  •  

Question

I have a few related questions about MySQL indexes:

  1. Does MySQL update the index every time something is inserted?
  2. When MySQL updates the index due to an insert, does it rebuild the entire index?
  3. Is there a way to make MySQL update the index after every x inserts?

I have a lot of inserts in my application and I'm afraid MySQL is rebuilding the index after every insert. The data does not have to be real-time, so I can update the index after a specific number of inserts (if it's possible).

Was it helpful?

Solution

MySQL is probably already doing what you describe, as much as it can.

In the case of InnoDB (which should be your default storage engine with MySQL), inserts and updates and deletes change primary key or unique key indexes immediately. But they never rebuild the whole index, they add new values into (or take values out of) these indexes.

For non-unique indexes, InnoDB performs change buffering. That is, it queues the changes, which will be merged into the index later in the background. It will even consolidate changes so that the physical updates to the index are done more efficiently.

You don't have to do anything to enable this feature, because it's enabled by default. MySQL 5.1 does change buffering only for INSERT. MySQL 5.5 and later additionally does change buffering for UPDATE and DELETE.

You can disable this feature if you want (for instance if you use SSD, avoiding random I/O is not so important, and you might want to ensure that queued changes don't accumulate). Normally, you should keep the feature enabled.

OTHER TIPS

MySQL doesn't "rebuild" the index after every insert. MySQL inserts one or more rows into an existing index.

MySQL has lots of unusual options, and I don't know all of them. I would be surprised if there were an option that said: "Oh, let the index on the table be out of synch with the data in the table." Doesn't sound reasonable.

If you have lots of inserts, the best strategy is to do the inserts in one statement. Instead of:

insert into t(...)
    select . . .
    from t2
    where id = id1;

Do:

insert into t(...)
    select . . .
    from t2
    where id in (id1, id2, . . .)

An extension on this is to insert into a temporary table. Then just load the temporary table into the big table all at once:

insert into t(...)
    select ...
    from temptable;

Finally, it is sometimes faster to drop indexes, do a big insert (in one or more steps), and then re-create the indexes.

One caution: if you drop unique indexes you are also dropping the unique constraint. This is important if you are using on duplicate key update, because it needs a secondary index to find the duplicate key (except for the primary key).

When MySQL updates the index due to an insert, does it rebuild the entire index?

No, MySQL doesn't "rebuild" the index on each insert.

MySQL's default page size is 16K. It allocates these pages in 1MB increments (called extents).

When a table is first created (indexes are rebuilt), the pages are filled up 15/16th full, leaving room for some random inserts. If your index entries are 500 bytes each each (primary key size + row data for a clustered index), that leaves room for 2 new rows to be inserted before having to split the page.

When MySQL needs to insert a row on a full page, the page must be split. MySQL will add a new page, and move half of the page data to the new page.

Within a page, records may not actually be in physical order. They'll be in the order they were inserted. They're linked in order via a form of linked list. So, even a random insert doesn't cause data to be physically reordered. Outside the need to split a page, the data isn't moved around.

After may random inserts, your pages will be from 1/2 full to full.

All of this work does affect your insert performance, as the index must be updated with each insert. In addition, an index with many half full pages will negatively affect read performance.

Now, if you're inserting rows in index order, then MySQL simply keeps adding to the end of the pages, filling them up 15/16 full, and adding an extent at a time of pages. Much less performance penalty since there is no splitting of pages, hence no moving of data is involved, not to mention the read performance benefit of nearly full pages.

So, while there is some maintenance involved in updating an index for inserts, MySQL isn't "rebuilding" the index on each insert. Also, see Bill Karwin's note about change buffering, which may affect you.

Note that MariaDB, which you are probably using if you apt install mysql on a modern system, removed the buffering mentioned in the accepted answer.

The change buffer has been disabled by default [in updates to] MariaDB 10.5, 10.6, 10.7, and 10.8, and the feature is deprecated and ignored from MariaDB 10.9.0.
Benchmarks show that the change buffer sometimes reduces performance, and in the best case seem to bring a few per cent improvement to throughput. However, [if untimely crashes occur] then the InnoDB system tablespace can grow out of control.

Paraphrased from https://mariadb.com/kb/en/innodb-change-buffering/, emphasis mine.

Oracle's database server still has this feature in their documentation for the latest version: https://dev.mysql.com/doc/refman/8.0/en/innodb-change-buffer.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top