Question

Could someone explain or point me to a resource that explains how a relational database (like Postgres or MySQL, I use Postgres more though) implements joins?

For instance, I can roughly tell you that indexes might be made of a B tree where nodes are conditions that might be present in a where clause. This index is built up whenever a change involves the index (like an insert). And from this information, I can assume that an unindexed column will be faster at insert and that an index might not help with certain searches, like regex or like pattern matching.

I'm looking to have a similar or better understanding of what happens when you do one or more joins.

Was it helpful?

Solution

In Postgresql, the planner/optimizer computes which of the 3 following methods to use (from PostgreSQL documentation):

nested loop join: The right relation is scanned once for every row found in the left relation. This strategy is easy to implement but can be very time consuming. (However, if the right relation can be scanned with an index scan, this can be a good strategy. It is possible to use values from the current row of the left relation as keys for the index scan of the right.)

merge join: Each relation is sorted on the join attributes before the join starts. Then the two relations are scanned in parallel, and matching rows are combined to form join rows. This kind of join is more attractive because each relation has to be scanned only once. The required sorting might be achieved either by an explicit sort step, or by scanning the relation in the proper order using an index on the join key.

hash join: the right relation is first scanned and loaded into a hash table, using its join attributes as hash keys. Next the left relation is scanned and the appropriate values of every row found are used as hash keys to locate the matching rows in the table.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top