Frage

I am grouping log records by a RegEx pattern. After grouping them I'd like to get a Distinct count of the records for each group. For this example, Distinct is defined as the same visit key and the same year, month, day, hour, and minute.

It's just a way of getting a more accurate count of something getting logged all the way up the stack by different consumers.

Alright, so I'm grouping them like this:

var knownMessages = logRecords
    .Where(record => !string.IsNullOrEmpty(record.InclusionPattern))
    .GroupBy(record => new
    {
        MessagePattern = record.InclusionPattern
    })
    .Select(g => new KnownMessage
    {
        MessagePattern = g.Key.MessagePattern,
---->   Count = g.Distinct().Count(),
        Records = g.ToList()
    })
    .OrderByDescending(o => o.Count);

And GetHashCode for the type is implemented like this:

public override int GetHashCode()
{
    var visitKeyHash = this.VisitKey == null ?
        251 : this.VisitKey.GetHashCode();
    var timeHash = this.Time.Year + this.Time.Month + this.Time.Day +
        this.Time.Hour + this.Time.Minute;

    return ((visitKeyHash * 251) + timeHash) * 251;
}

But, for example, in the list I have three records that return the same hash code 1439926797; I still get a count of 3. I know it's leveraging GetHashCode (as I expected) to do the comparison because I have a breakpoint there to see what the hash code is.

What did I miss?

War es hilfreich?

Lösung

First let me repeat what I said in my comment.

The logic is : If a.GetHashcode() != b.GetHashCode() then a != b, If a.GetHashCode() == b.GetHashCode() && a.Equals(b) then a == b, All GetHashcode() does for you is lets you skip the Equals() check if you have two different values. That is why you need to implement both, If you only implement Equals() then the a.GetHashCode() == b.GetHashCode() step fails and it never tries the Equals() you implemented.

GetHashCode() should be fast and it's value should not change while it sits in a collection that depends on it's value. So don't modify VisitKey nor Time if you are storing these inside a Dictionary or HashSet or similar.

So all you need to do is:

public override int GetHashCode()
{
    var visitKeyHash = this.VisitKey == null ?
        251 : this.VisitKey.GetHashCode();
    var timeHash = this.Time.Year + this.Time.Month + this.Time.Day +
        this.Time.Hour + this.Time.Minute;

    return ((visitKeyHash * 251) + timeHash);
}

public override bool Equals(object obj)
{
    //Two quick tests before we start doing all the math.        
    if(Object.ReferenceEquals(this, obj))
        return true;

    KnownMessage message = obj as KnownMessage;
    if(Object.ReferenceEquals(message, null)))
        return false;

    return this.VisitKey.Equals(message.VisitKey) &&
           this.time.Year.Equals(message.Time.Year) &&
           this.time.Month.Equals(message.Time.Month) &&
           this.time.Day.Equals(message.Time.Day) &&
           this.time.Hour.Equals(message.Time.Hour) &&
           this.time.Minute.Equals(message.Time.Minute);
}

Andere Tipps

It seems you have not overridden the Equals method to use the same definition of equality as your hash code generation algorithm. Since that is used to resolve hash collisions, it is important that the two always be in sync.

You don't give your Equals override. As with other hash-based collections like Dictionary and HashSet, the internal structure used by Distinct() uses GetHashCode() to select a hash to store by, but Equals to determine actual equality.

The problem could be either a bug in your Equals or in your GetHashCode, but in the later case is that it doesn't correctly match your Equals (GetHashCode must return the same hash for two objects for which Equals returns true, but can of course also return the same for two different objects), which makes it a bug in the pair of methods. So either way, the problem is directly or indirectly in your override of Equals.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top