Question

I have been reading about Hash Join and how does it work in physical level. However, there is some thing that I don't understand (because of my lack of knowledge).

This is the algorithm of Hash Join that I found;

for each row R1 in the build table
    begin
        calculate hash value on R1 join key(s)
        insert R1 into the appropriate hash bucket
    end
for each row R2 in the probe table
    begin
        calculate hash value on R2 join key(s)
        for each row R1 in the corresponding hash bucket
            if R1 joins with R2
                return (R1, R2)
    end

This is seems adequate for join conditions like "TABLE1.NAME == TABLE2.NAME" etc. However, how does it produce hashes or compares tham when join conditions are "TABLE1.NAME == TABLE2.NAME" && "TABLE1.AGE > TABLE2.AGE".

I just couldn't find and understand how they produce or compare the hashes that requires both equailty operator and some other relational operators like bigger than that etc.

Was it helpful?

Solution

From the article you were reading (by Craig Freedman):

Hash join shares many characteristics with merge join. Like merge join, it requires at least one equijoin predicate, supports residual predicates, and supports all outer and semi-joins.

(Emphasis added)

The hash keys are built from the (required) equijoin predicates. All other predicates are evaluated during the bucket scan, hence the name "residual predicate".

In your example, Name would be used to form the hash key. The age predicate would be evaluated on all rows in the matching hash bucket, after the initial hash probe. In cases where the hash key is not unique, the actual value of the hashed columns in the bucket will also be compared, naturally.

The execution plan properties (in SQL Server anyway) for the hash join will show you the build hash keys and any residual.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top