Optimizing JOINs : comparison with indexed tables

https://stackoverflow.com/questions/12105249

28-06-2021
|

Question

Let's say we have a time consuming query described below :

(SELECT ...
FROM ...) AS FOO
LEFT JOIN (
    SELECT ...
    FROM ...) AS BAR
ON FOO.BarID = BAR.ID

Let's suppose that

(SELECT ...
FROM ...) AS FOO

Returns many rows (let's say 10 M). Every single row has to be joined with data in BAR.

Now let's say we insert the result of

    SELECT ...
    FROM ...) AS BAR

In a table, and add the ad hoc index(es) to it.

My question :

How would the performance of the "JOIN" with a live query differ from the performance of the "JOIN" to a table containing the result of the previous live query, to which ad hoc indexes would have been added ?

Another way to put it :

If a JOIN is slow, would there be any gain in actually storing and indexing the table to which we JOIN to ?

Solution

The answer is 'Maybe'.

It depends on the statistics of the data in question. The only way you'll find out for sure is to actually load the first query into a temp table, stick a relevant index on it, then run the second part of the query.

I can tell you if speed is what you want, if it's possible for you load the results of your first query permanently into a table then of course your query is going to be quicker.

If you want it to be even faster, depending on which DBMS you are using you could consider creating an index which crosses both tables - if you're using SQL Server they're called 'Indexed Views' or you can also look up 'Reified indexes' for other systems.

Finally, if you want the ultimate in speed, consider denormalising your data and eliminating the join that is occurring on the fly - basically you move the pre-processing (the join) offline at the cost of storage space and data consistency (your live table will be a little behind depending on how frequently you run your updates).

I hope this helps.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow