Object.GetHashCode() のデフォルト実装

https://stackoverflow.com/questions/720177

23-08-2019
|

質問

デフォルトの実装はどのように行われますか GetHashCode() 仕事？また、構造体、クラス、配列などを処理しますか?効率的で十分ですか？

どのような場合に独自の実装を行う必要があるのか、またどのような場合にデフォルトの実装を安全に信頼してうまく機能させることができるのかを判断しようとしています。できれば車輪の再発明はしたくない。

解決

namespace System {
    public class Object {
        [MethodImpl(MethodImplOptions.InternalCall)]
        internal static extern int InternalGetHashCode(object obj);

        public virtual int GetHashCode() {
            return InternalGetHashCode(this);
        }
    }
}

InternalGetHashCode マップされる ObjectNative::GetHashCode 機能のCLRるようになります:

FCIMPL1(INT32, ObjectNative::GetHashCode, Object* obj) {  
    CONTRACTL  
    {  
        THROWS;  
        DISABLED(GC_NOTRIGGER);  
        INJECT_FAULT(FCThrow(kOutOfMemoryException););  
        MODE_COOPERATIVE;  
        SO_TOLERANT;  
    }  
    CONTRACTL_END;  

    VALIDATEOBJECTREF(obj);  

    DWORD idx = 0;  

    if (obj == 0)  
        return 0;  

    OBJECTREF objRef(obj);  

    HELPER_METHOD_FRAME_BEGIN_RET_1(objRef);        // Set up a frame  

    idx = GetHashCodeEx(OBJECTREFToObject(objRef));  

    HELPER_METHOD_FRAME_END();  

    return idx;  
}  
FCIMPLEND

完全に実施 GetHashCodeEx かなり大きいので、くだけでリンクのC++ソースコード.

他のヒント

のためのクラスは、デフォルトは基本的に参照が等が通常です。し込みくださstructであまりオーバーライド等は少なくとも避けるボクシング)基準を満たしており、安全面でも珍しい書きstructうというものだった。

時をオーバーの平等、いマッチング Equals() や GetHashCode() （のための二つの値の場合 Equals() はtrueを返しまい必要戻り、同じハッシュコードが、その逆でない必須では共通するも ==/!=事業者は、しばしば行 IEquatable<T> ます。

を発生させるためのハッシュコードすることが一般的での利用を織り込んだ和は、この衝突を避ける登録済みの値は、例えば、基本的な2つの分野のハッシュ:

unchecked // disable overflow, for the unlikely possibility that you
{         // are compiling with overflow-checking enabled
    int hash = 27;
    hash = (13 * hash) + field1.GetHashCode();
    hash = (13 * hash) + field2.GetHashCode();
    return hash;
}

この特長:

のハッシュの{1,2}と同じではないのハッシュの{2,1}
のハッシュの{1,1}と同じではないのハッシュの{2,2}

などをできるだけ使用unweighted和、またはxor(^等

のドキュメント GetHashCode の方法物体言う 「このメソッドのデフォルト実装は、ハッシュ目的の一意のオブジェクト識別子として使用してはなりません。」 そしてそのためのもの値の種類言う 「派生型の GetHashCode メソッドを呼び出した場合、戻り値はハッシュテーブルのキーとして使用するのに適さない可能性があります。」.

次のような基本的なデータ型 byte, short, int, long, char そして string 適切な GetHashCode メソッドを実装します。他のいくつかのクラスと構造体、 Point たとえば、 GetHashCode この方法は、特定のニーズに適している場合もあれば、適していない場合もあります。それが十分かどうかを確認するには、試してみる必要があります。

各クラスまたは構造体のドキュメントを参照すると、デフォルトの実装をオーバーライドするかどうかがわかります。オーバーライドしない場合は、独自の実装を使用する必要があります。自分で作成したクラスまたは構造体の場合、 GetHashCode メソッドを使用する場合は、適切なメンバーを使用してハッシュコードを計算する独自の実装を作成する必要があります。

Since I couldn't find an answer that explains why we should override GetHashCode and Equals for custom structs and why the default implementation "is not likely to be suitable for use as a key in a hash table", I'll leave a link to this blog post, which explains why with a real-case example of a problem that happened.

I recommend reading the whole post, but here is a summary (emphasis and clarifications added).

Reason the default hash for structs is slow and not very good:

The way the CLR is designed, every call to a member defined in System.ValueType or System.Enum types [may] cause a boxing allocation [...]

An implementer of a hash function faces a dilemma: make a good distribution of the hash function or to make it fast. In some cases, it's possible to achieve them both, but it is hard to do this generically in ValueType.GetHashCode.

The canonical hash function of a struct "combines" hash codes of all the fields. But the only way to get a hash code of a field in a ValueType method is to use reflection. So, the CLR authors decided to trade speed over the distribution and the default GetHashCode version just returns a hash code of a first non-null field and "munges" it with a type id [...] This is a reasonable behavior unless it's not. For instance, if you're unlucky enough and the first field of your struct has the same value for most instances, then a hash function will provide the same result all the time. And, as you may imagine, this will cause a drastic performance impact if these instances are stored in a hash set or a hash table.

[...] Reflection-based implementation is slow. Very slow.

[...] Both ValueType.Equals and ValueType.GetHashCode have a special optimization. If a type does not have "pointers" and is properly packed [...] then more optimal versions are used: GetHashCode iterates over an instance and XORs blocks of 4 bytes and Equals method compares two instances using memcmp. [...] But the optimization is very tricky. First, it is hard to know when the optimization is enabled [...] Second, a memory comparison will not necessarily give you the right results. Here is a simple example: [...] -0.0 and +0.0 are equal but have different binary representations.

Real-world issue described in the post:

private readonly HashSet<(ErrorLocation, int)> _locationsWithHitCount;
readonly struct ErrorLocation
{
    // Empty almost all the time
    public string OptionalDescription { get; }
    public string Path { get; }
    public int Position { get; }
}

We used a tuple that contained a custom struct with default equality implementation. And unfortunately, the struct had an optional first field that was almost always equals to [empty string]. The performance was OK until the number of elements in the set increased significantly causing a real performance issue, taking minutes to initialize a collection with tens of thousands of items.

So, to answer the question "in what cases I should pack my own and in what cases I can safely rely on the default implementation", at least in the case of structs, you should override Equals and GetHashCode whenever your custom struct might be used as a key in a hash table or Dictionary.
I would also recommend implementing IEquatable<T> in this case, to avoid boxing.

As the other answers said, if you're writing a class, the default hash using reference equality is usually fine, so I wouldn't bother in this case, unless you need to override Equals (then you would have to override GetHashCode accordingly).

Generally speaking, if you're overriding Equals, you want to override GetHashCode. The reason for this is because both are used to compare equality of your class/struct.

Equals is used when checking Foo A, B;

if (A == B)

Since we know the pointer isn't likely to match, we can compare the internal members.

Equals(obj o)
{
    if (o == null) return false;
    MyType Foo = o as MyType;
    if (Foo == null) return false;
    if (Foo.Prop1 != this.Prop1) return false;

    return Foo.Prop2 == this.Prop2;
}

GetHashCode is generally used by hash tables. The hashcode generated by your class should always be the same for a classes give state.

I typically do,

GetHashCode()
{
    int HashCode = this.GetType().ToString().GetHashCode();
    HashCode ^= this.Prop1.GetHashCode();
    etc.

    return HashCode;
}

Some will say that the hashcode should only be calculated once per object lifetime, but I don't agree with that (and I'm probably wrong).

Using the default implementation provided by object, unless you have the same reference to one of your classes, they will not be equal to each other. By overriding Equals and GetHashCode, you can report equality based on internal values rather than the objects reference.

If you're just dealing with POCOs you can use this utility to simplify your life somewhat:

var hash = HashCodeUtil.GetHashCode(
           poco.Field1,
           poco.Field2,
           ...,
           poco.FieldN);

...

public static class HashCodeUtil
{
    public static int GetHashCode(params object[] objects)
    {
        int hash = 13;

        foreach (var obj in objects)
        {
            hash = (hash * 7) + (!ReferenceEquals(null, obj) ? obj.GetHashCode() : 0);
        }

        return hash;
    }
}

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow