Question

Is there any benefit to creating an index on a temporary table containing just a primary key from a materialized query?

I want to delete some data from a particular table, as well as other related tables with foreign key references. In order to improve performance, I'm materializing the initial select into a temp table and then joining against it for subsequent deletes.

The temp table contains only one column -- the primary key from the subquery. Is there any performance benefit to creating an index on the temp table's id column? In my testing I saw an improvement of about 2% (more then offset by the overhead of the index creation), but perhaps the dataset available to me to test was not large enough.

CREATE TEMPORARY TABLE ids AS (SELECT id FROM tableA WHERE xxx);
DELETE tableB FROM tableB INNER JOIN ids ON tableB.a_id = ids.id;
DELETE tableC FROM tableC INNER JOIN ids ON tableC.a_id = ids.id;
...
DELETE tableA FROM tableA INNER JOIN ids ON tableA.id = ids.id;

Since all rows from ids temporary table will be used to delete rows in tableB (a_id is indexed), is there any performance benefit to creating a primary key / index on the ids temporary table? Is there a better better, completely different way to approach this?

Was it helpful?

Solution

It entirely depends on what type of queries you run. If you only ever run queries that need to read, or return, the entire table or a significant subset of the entire table, then adding an index will only result in decreasing write performance (which it always does). If you will often execute queries that can use such an index to reduce the number of disk page I/Os (because you are looking for only one row, or a very small percentage of the rows) in the table, then adding an index will markedly increase the performance of those queries.

OTHER TIPS

Actually, this is one case where a primary key index could be dangerous for performance.

The queries that you have essentially have two logical execution paths. One is to read the "other" table and look up values in ids. The second is to read the id table and look up values in the "other" table. The latter execution plan is the best one, assuming that the ids are much smaller than the other table.

The problem with the primary key index is that it might confuse the optimizer, by really making the first option seem reasonable. If you trust the optimizer, then having the index is no problem. But it does raise the possibility of confusion.

Now to confuse matters further, there are cases where having the index would be very beneficial. This occurs when the ids table is large relative to the other tables -- and these are also quite big. In this case, you want to do the deletes in "primary key" order for the "other" table. So, reading that table in order and looking up the id makes sense. This would only be the case when most pages have at least two records on them that are to be deleted.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top