Question

I wanted to implement an algorithm with Dictionary<Dictionary<char,int>, List<string>> to find the anagram words in a dictionary.

As i need to implement my custom EqualityComparer for this Dictionary, is the access time still O(1) i.e big O (1) ?

Second question, As part of the EqualityComparer I also need to implement the GetHashCode(). What is the efficient way of determining GetHashCode() for Dictionary<Dictionary<char,int>, List<string>>?

i just came up with this method, is there any better alternative?

public int GetHashCode(Dictionary<char, int> obj)
    {
        unchecked
        {
            int hashCode = 17;
            foreach (var item in obj)
            {
                hashCode += 23 * item.Key.GetHashCode();
            }
            return hashCode;
        }
    }

Any word of advice is appreciated. Thanks!

Was it helpful?

Solution

How about converting the word "need" into the string "d1e2n1" instead of using a Dictionary as key? In order to build this string, you could use a binary tree. A char would be used as key and the character count as value. The binary tree is automatically sorted by the key, which is not the case for a dictionary.

You can calculate a combined hash value from single hash values by combining their binary representation with the XOR operation. With C#, you would do something like this:

public override int GetHashCode()
{
    // Combine hashcode of a and b
    return a.GetHashCode() ^ b.GetHashCode();
}

Finding an entry in an unsorted list is an O(n) operation. Finding an entry in a sorted list is an O(log(n)) operation, if a binary search is used.

Finding a word in a list in a dictionary is an O(1 + n) operation, which is the same as an O(n) operation, or an O(1 + log(n)) operation, which is the same as an O(log(n)) operation.


EDIT:

Here is a possible implementation:

var anagrams = new Dictionary<string, List<string>>();
foreach (string word in words) {
    string key = GetFrequency(word);
    List<string> list;
    if (anagrams.TryGetValue(key, out list)) {
        list.Add(word);
    } else {
        list = new List<string> { word };
        anagrams.Add(key, list);
    }
}

It uses this method to get the key:

private string GetFrequency(string word)
{
    var dict = new SortedDictionary<char, int>(); // Implemented as binary tree
    foreach (char c in word.ToLower()) {
        int count;
        if (dict.TryGetValue(c, out count)) {
            dict[c] += 1;
        } else {
            dict[c] = 1;
        }
    }
    return dict.Aggregate(new StringBuilder(), (sb, item) => sb.Append(item.Key).Append(item.Value), sb => sb.ToString());
}

Using this definition for the words ...

var words = new List<string> { "need", "eden", "team", "meat", "meta", "Nat", "tan" };

This test ...

foreach (var item in anagrams.OrderBy(x => x.Key)) {
    Console.WriteLine();
    Console.WriteLine(item.Key + ":");
    foreach (string word in item.Value.OrderBy(w => w)) {
        Console.WriteLine("    " + word);
    }
}

... produces this output

a1e1m1t1:
    meat
    meta
    team

a1n1t1:
    Nat
    tan

d1e2n1:
    eden
    need

EDIT #2:

Here is the frequency calculation as suggest by Ben Voigt

private string GetFrequencyByBenVoigt(string word)
{
    char[] chars = word.ToLower().ToCharArray();
    Array.Sort(chars);
    return new string(chars);
}

The test result would be

aemt:
    meat
    meta
    team

ant:
    Nat
    tan

deen:
    eden
    need

OTHER TIPS

The access time of a Dictionary<TKey, TValue> approaches O(1) but is not exactly so. In ideal scenarios (good distribution / few collisions) you can think of it as being O(1) though. In situations where there are lots of collisions due to a low variance in the GetHashCode values the access time degrades and can approach O(N).

A hash code based on container content will be O(n) in the count of items in the container. You could wrap the dictionary in another type and cache the hash code so it only needs to be computed once... but I can think of several more efficient ways to store that data than a dictionary.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top