Execution Plan Basics — Hash Match Confusion

https://dba.stackexchange.com/questions/1876

16-10-2019
|

سؤال

I am starting to learn execution plans and am confused about how exactly a hash match works and why it would be used in a simple join:

select Posts.Title, Users.DisplayName
From Posts JOIN Users on
Posts.OwnerUserId = Users.Id
OPTION (MAXDOP 1)

enter image description here

As I understand it the results of the Top index scan become the hash able and each row in the bottom Index clustered scan is looked up. I understand how hash tables work to at least some degree, but I am confused about which values exactly get hashed in an example like this.

What would make sense me is the the common field between them, the id, is hashed -- but if this is the case, why hash a number?

المحلول

As SQLRockstar's answer quotes

best for large, unsorted inputs.

Now,

from the Users.DisplayName index scan (assumed nonclustered) you get Users.Id (assuming clustered) = unsorted
You are also scanning Posts for OwnerUserId = unsorted

This is 2 unordered inputs.

I'd consider an index on the Posts table on OwnerUserId, including Title. This will add some order on one side of the input to the JOIN + it will be covering index

CREATE INDEX IX_OwnerUserId ON Posts (OwnerUserId) INCLUDE (Title)

You may then find that the Users.DisplayName index won't be used and it will scan the PK instead.

نصائح أخرى

From http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/

"The hash join is one of the more expensive join operations, as it requires the creation of a hash table to do the join. That said, it’s the join that’s best for large, unsorted inputs. It is the most memory-intensive of any of the joins

The hash join first reads one of the inputs and hashes the join column and puts the resulting hash and the column values into a hash table built up in memory. Then it reads all the rows in the second input, hashes those and checks the rows in the resulting hash bucket for the joining rows."

which links to this post:

http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx

HTH

The advantage of hashing a numeric field is that you're taking a bigger value and breaking it down into smaller pieces so that it can fit into a hash table.

Here's how Grant Fritchey describes it:

"A hash table, on the other hand, is a data structure that divides all of the elements into equal-sized categories, or buckets, to allow quick access to the elements. The hashing function determines which bucket an element goes into. For example, you can take a row from a table, hash it into a hash value, then store the hash value into a hash table."

You can also get a free copy of his ebook "Dissecting SQL Server Execution Plans" from a link from the following article:

Source: http://www.simple-talk.com/sql/performance/graphical-execution-plans-for-simple-sql-queries/

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى dba.stackexchange