Synchronized IEnumerator<T>

https://stackoverflow.com/questions/2617400

26-09-2019
|

Question

I'm putting together a custom SynchronizedCollection<T> class so that I can have a synchronized Observable collection for my WPF application. The synchronization is provided via a ReaderWriterLockSlim, which, for the most part, has been easy to apply. The case I'm having trouble with is how to provide thread-safe enumeration of the collection. I've created a custom IEnumerator<T> nested class that looks like this:

    private class SynchronizedEnumerator : IEnumerator<T>
    {
        private SynchronizedCollection<T> _collection;
        private int _currentIndex;

        internal SynchronizedEnumerator(SynchronizedCollection<T> collection)
        {
            _collection = collection;
            _collection._lock.EnterReadLock();
            _currentIndex = -1;
        }

        #region IEnumerator<T> Members

        public T Current { get; private set;}

        #endregion

        #region IDisposable Members

        public void Dispose()
        {
            var collection = _collection;
            if (collection != null)
                collection._lock.ExitReadLock();

            _collection = null;
        }

        #endregion

        #region IEnumerator Members

        object System.Collections.IEnumerator.Current
        {
            get { return Current; }
        }

        public bool MoveNext()
        {
            var collection = _collection;
            if (collection == null)
                throw new ObjectDisposedException("SynchronizedEnumerator");

            _currentIndex++;
            if (_currentIndex >= collection.Count)
            {
                Current = default(T);
                return false;
            }

            Current = collection[_currentIndex];
            return true;
        }

        public void Reset()
        {
            if (_collection == null)
                throw new ObjectDisposedException("SynchronizedEnumerator");

            _currentIndex = -1;
            Current = default(T);
        }

        #endregion
    }

My concern, however, is that if the Enumerator is not Disposed, the lock will never be released. In most use cases, this is not a problem, as foreach should properly call Dispose. It could be a problem, however, if a consumer retrieves an explicit Enumerator instance. Is my only option to document the class with a caveat implementer reminding the consumer to call Dispose if using the Enumerator explicitly or is there a way to safely release the lock during finalization? I'm thinking not, since the finalizer doesn't even run on the same thread, but I was curious if there other ways to improve this.

EDIT

After thinking about this a bit and reading the responses (particular thanks to Hans), I've decided this is definitely a bad idea. The biggest issue actually isn't forgetting to Dispose, but rather a leisurely consumer creating deadlock while enumerating. I now only read-lock long enough to get a copy and return the enumerator for the copy.

Solution

You are right, that's a problem. The finalizer is useless, it will run far too late to be of any use. The code should have deadlocked heavily before that anyway. Unfortunately, there's no way for you to tell the difference between the a foreach calling your MoveNext/Current members or the client code using them explicitly.

No fix, don't do this. Microsoft didn't do it either, they had plenty of reason to back in .NET 1.x. The only real thread-safe iterator you can make is one that creates a copy of the collection object in the GetEnumerator() method. The iterator getting out of sync with the collection is no joy either though.

OTHER TIPS

This seems far too error-prone to me. It encourages situations in which a lock is implicitly/silently taken out, in a way that is not clear to the reader of the code, and makes it likely that a crucial fact about the interface will be misunderstood.

Usually it's a good idea to duplicate common patterns - represent an enumerable collection with IEnumerable<T> that is disposed when you're done with it - but the added ingredient of taking out a lock makes a big difference, unfortunately.

I'd suggest the ideal approach would be to not offer enumeration on collections shared between threads at all. Try to design the whole system so it isn't needed. Obviously this is going to be a crazy pipe-dream sometimes.

So the next best thing would be to define a context within which an IEnumerable<T> is temporarily available, while the lock exists:

public class SomeCollection<T>
{
    // ...

    public void EnumerateInLock(Action<IEnumerable<T>> action) ...

    // ...
}

That is, when the user of this collection wants to enumerate it, they do this:

someCollection.EnumerateInLock(e =>
    {
        foreach (var item in e)
        {
            // blah
        }
    });

This makes the lifetime of the lock explicitly stated by a scope (represented by the lambda body, working much like a lock statement), and impossible to extend accidentally by forgetting to dispose. It's impossible to abuse this interface.

The implementation of the EnumerateInLock method would be like this:

public void EnumerateInLock(Action<IEnumerable<T>> action)
{
    var e = new EnumeratorImpl(this);

    try
    {
        _lock.EnterReadLock();
        action(e);
    }
    finally
    {
        e.Dispose();
        _lock.ExitReadLock();
    }
}

Notice how the EnumeratorImpl (which needs no particular locking code of its own) is always disposed before the lock is exited. After disposal, it throws ObjectDisposedException in response to any method call (other than Dispose, which is ignored.)

This means that even if there is an attempt to abuse the interface:

IEnumerable<C> keepForLater = null;
someCollection.EnumerateInLock(e => keepForLater = e);

foreach (var item in keepForLater)
{
    // aha!
}

This will always throw, rather than failing mysteriously sometimes based on the timing.

Using a method that accepts a delegate like this is a general technique for managing resource lifetimes commonly used in Lisp and other dynamic languages, and while it is less flexible than implementing IDisposable, that reduced flexibility is often a blessing: it removes the concern over clients "forgetting to dispose".

Update

From your comment, I see that you need to be able to hand a reference to the collection to an existing UI framework, which will therefore expect to be able to use the normal interface to a collection, i.e. directly get an IEnumerable<T> from it and be trusted to clean it up quickly. In which case, why worry? Trust the UI framework to update the UI and dispose the collection rapidly.

Your only other realistic option is simply to make a copy of the collection when the enumerator is requested. That way, the lock only needs to be held when the copy is being made. As soon as it's ready, the lock is released. This may be more efficient if the collections are usually small, so the overhead of the copy is less than the performance saving due to shorter locks.

It's tempting (for about a nanosecond) to suggest that you use a simple rule: if the collection is smaller than some threshold, make the copy, otherwise do it in your original way; choose the implementation dynamically. That way you get the optimum performance - set the threshold (by experiment) such that the copy is cheaper than holding the lock. However, I'd always think twice (or a billion times) about such "clever" ideas in threaded code, because what if there is an abuse of the enumerator somewhere? If you forget to dispose it, you won't see a problem unless it's a large collection... A recipe for hidden bugs. Don't go there!

Another potential drawback with the "expose a copy" approach is that clients will undoubtedly fall under the assumption that if an item is in the collection it is exposed to the world, but as soon as it is removed from the collection it is safely hidden. This will now be wrong! The UI thread will obtain an enumerator, then my background thread will remove the last item from it, and then begin mutating it in the mistaken belief that, because it was removed, no one else can see it.

So the copying approach requires every item on the collection to effectively have its own synchronization, where most coders will assume that they can shortcut this by using the collection's synchronization instead.

I had to do this recently. The way I did it was to abstract it so that there is an inner object (reference) that contains both the actual list/array and the count (and a GetEnumerator() implementation; then I can do lock-free, thread-safe enumeration by having:

public IEnumerator<T> GetEnumerator() { return inner.GetEnumerator();}

The Add etc need to be synchronized, but they change the inner reference (since reference updates are atomic, you don't need to synchronize GetEnumerator()). This then means that any enumerator will return as many items as were there when the enumerator was created.

Of course, it helps that my scenario was simple, and my list was Add only... if you need to support mutate / remove then it is much trickier.

In the must used IDisposable implementation, you create a protected Dispose(bool managed) method which always disposes the unmanaged resources you use. By calling your protected Dispose(false) method from your finalizer, ~~you'll dispose the lock as required~~. The lock is managed, do you'll only dispose it when Dispose(true) is called, where true means that managed objects need to be disposed. Otherwise, when the public Dispose() is called explicitly, it calls the protected Dispose(true) and also GC.SuppressFinalize(this) to prevent the finalizer from running (because there is nothing to dispose of anymore).

Because you never know when the user is done with the enumerator, you have no other option than documenting that the user has to dispose the object. You might want to propose that the user uses a using(){ ... } construction, which automatically disposes the object when done.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow