Question

I am getting a hard to reproduce error in the following program in which a number of threads update a concurrent dictionary in parallel and the main thread displays the state of the dictionary in sorted order after fixed time intervals, until all updating threads complete.

public void Function(IEnumerable<ICharacterReader> characterReaders, IOutputter outputter)
{
    ConcurrentDictionary<string, int> wordFrequencies = new ConcurrentDictionary<string, int>();
    Thread t = new Thread(() => UpdateWordFrequencies(characterReaders, wordFrequencies));
    bool completed = false;
    var q = from pair in wordFrequencies orderby pair.Value descending, pair.Key select new Tuple<string, int>(pair.Key, pair.Value);
    t.Start();
    Thread.Sleep(0);

    while (!completed)
    {
        completed = t.Join(1);
        outputter.WriteBatch(q);
    }            
}

The function is given a list of character streams and an outputter. The function maintains a concurrent dictionary of word frequencies of words read from each of the character streams (in parallel). The words are read in by a new thread, and the main thread outputs the current state of the dictionary (in sorted order) every 1 miliseconds until all the input streams have been read (in practice the outputting will be something like every 10 seconds, but the error only seems to be appearing for very small values). The WriteBatch function just writes to the console:

public void WriteBatch(IEnumerable<Tuple<string, int>> batch)
{
    foreach (var tuple in batch)
    {
        Console.WriteLine("{0} - {1}", tuple.Item1, tuple.Item2);
    }
    Console.WriteLine();
}

Most executions are fine, but sometimes I get the following error at the foreach statement in the WriteBatch function:

"Unhandled Exception: System.ArgumentException: The index is equal to or greater than the length of the array, or the number of elements in the dictionary is gre ater than the available space from index to the end of the destination array."

The error does seem to go away if the main thread sleeps for a short while after starting the updating threads and before starting the display loop. It also seems to go away if the orderby clause is removed and the dictionary is not sorted in the linq query. Any explanations?

The foreach (var tuple in batch) statement in the WriteBatch function gives the error. The stack trace is as follows:

Unhandled Exception: System.ArgumentException: The index is equal to or greater than the length of the array, or the number of elements in the dictionary is gre ater than the available space from index to the end of the destination array. at System.Collections.Concurrent.ConcurrentDictionary2.System.Collections.Ge neric.ICollection>.CopyTo(K eyValuePair2[] array, Int32 index) at System.Linq.Buffer1..ctor(IEnumerable1 source) at System.Linq.OrderedEnumerable1.d__0.MoveNext() at System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext() at MyProject.ConsoleOutputter.WriteBatch(IEnumerable1 batch) in C:\MyProject\ConsoleOutputter.cs:line 10 at MyProject.Function(IEnumerable1 characterReaders, IOutputter outputter)

Was it helpful?

Solution

As others have said, there is a race in the constructor of the internal class System.Linq.Buffer<T>, which is called by OrderBy.

Here is the offending code snippet:

TElement[] array = null;
int num = 0;
if (collection != null)
{
    num = collection.Count;
    if (num > 0)
    {
        array = new TElement[num];
        collection.CopyTo(array, 0);
    }
}

The exception is thrown when item(s) are added to the collection after the call to collection.Count but before the call to collection.CopyTo.


As a work around, you can make a "snapshot" copy of the dictionary before you sort it.

You can do this by calling ConcurrentDictionary.ToArray.
As this is implemented in the ConcurrentDictionary class itself, it is safe.

Using this approach means you don't have to protect the collection with a lock which, as you say, defeats the purpose of using a concurrent collection in the first place.

while (!completed)
{
    completed = t.Join(1);

    var q =
      from pair in wordFrequencies.ToArray() // <-- add ToArray here
      orderby pair.Value descending, pair.Key
      select new Tuple<string, int>(pair.Key, pair.Value);

    outputter.WriteBatch(q);
}            

OTHER TIPS

After a discussion with ChrisShain in the comments, the conclusion is that you should get mutually exclusive access to the dictionary before printing it out, either with a mutex of a lock statement.

Doing it with a lock:

public void WriteBatch(IEnumerable<Tuple<string, int>> batch)
{
    lock (myLock) 
    {
        foreach (var tuple in batch)
        {
            Console.WriteLine("{0} - {1}", tuple.Item1, tuple.Item2);
        }
        Console.WriteLine();
    }
}

assuming you allocated a myLock object at the class level. See example.

Doing it with a mutex:

public void WriteBatch(IEnumerable<Tuple<string, int>> batch)
{
    mut.WaitOne();

    foreach (var tuple in batch)
    {
        Console.WriteLine("{0} - {1}", tuple.Item1, tuple.Item2);
    }
    Console.WriteLine();

    mut.ReleaseMutex();
}

Again, assuming you allocated a Mutex object at the class level. See example.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top