Question

I've read the MSDN documents and this blog and I need the following logic:

For a ConcurrentDictionary<string,bool>

  1. If the string doesn't exist, add it,and make sure I set the bool to True while adding
  2. If the string does exist, only change the bool to True if it's false. Otherwise cancel the update

My use case

I have several DNS Domains to scan for malware. There is a good likelihood that there will be duplicates in the list that I retrieve in realtime. I receive the list of DNS Domains in batches of 100 or less, and there will be over 10,000 domains to scan.

I only want to scan a DNS host once per iteration of 10,000 domains. A bool == true means it's currently being scanned and I should cancel the task before I go any further. A bool == false or no entry means I should immediately update the entry as bool==true or create a new entry ASAP.

Keep in mind...

AddOrUpdate will be called independently from many independent threads in .NET4's TPL. Each thread needs to decide if it needs to work on the value mentioned in Dictionary's key... or proceed to the next one. Only one "key" should ever have work done to it.

I need to signal to the calling thread that the update succeeded or failed. In addition according to this answer it seems that AddOrUpdate's functions will be called many times. I think this may mean my calling threads will be confused as to cancel work on key or to continue it. (remember only one thread can be actively working on key

Example of concurrent updates that may confuse the calling thread

ConcurrentDictionary<int, string> numbers = new ConcurrentDictionary<int, string>();
Parallel.For(0, 10, x =>
{
    numbers.AddOrUpdate(1,
        i =>
        {
            Console.WriteLine("addValueFactory has been called");
            return i.ToString();
        },
        (i, s) =>
        {
            Console.WriteLine("updateValueFactory has been called");
            return i.ToString();
        });
});

Output

addValueFactory has been called
addValueFactory has been called
addValueFactory has been called
addValueFactory has been called
updateValueFactory has been called
updateValueFactory has been called
updateValueFactory has been called
updateValueFactory has been called
updateValueFactory has been called
updateValueFactory has been called
updateValueFactory has been called
updateValueFactory has been called
updateValueFactory has been called

Question

How should I add this "cancel update" functionality to AddOrUpdate?

Was it helpful?

Solution

If I understand what you're trying to achieve, I don't think you can use a ConcurrentDictionary<string, bool> for this.

One possible solution would be to have a class that encapsulates the scanning of a given host:

public class Scanner
{
    private static _syncRoot = new object();

    public Scanner(string host)
    {
        Host = host;
        StartScanning();
    }

    public string Host {get; private set; }

    public bool IsScanning {get; private set; }

    public void StartScanning()
    {
        lock(_syncRoot)
        {
            if (!IsScanning)
            {
                IsScanning = true;
                // Start scanning Host asynchronously
                ...
            }
        }
    }

    private void EndScanning()
    {
        // Called when asynchronous scanning has completed
        IsScanning = false;
    }
}

Then a dictionary ConcurrentDictionary<string, Lazy<Scanner>>.

You would use it as follows:

Scanner s = dictionary.GetOrAdd(host, new Lazy<Scanner>(() => new Scanner(host));
s.StartScanning();

The Lazy<Scanner> instance will use the default LazyThreadSafetyMode.ExecutionAndPublication mode, which means that only one thread will ever call the factory delegate to instantiate a Scanner for a given hosts.

From my understanding of your question, it looks to me like this is what you are trying to achieve, i.e. don't scan the same host more than once.

OTHER TIPS

Use the AddOrUpdate method spoken about in that blog post. In your add delgate, set the bool to true. In your update delegate, have it check the bool value that's passed in as a parameter to the delegate and always return true. I say that because you're saying

  • If it's false, set it to true
  • If it's true, cancel the update (ie leave it as true). So you might as well set it to true

If there's some other condition missing please elaborate.

You could do something along the lines of:

if (dic.TryAdd(domain, true)) || (dic.TryUpdate(domain, true, false)) {
   // this thread just added a new 'true' entry, 
   // or changed an existing 'false' entry to 'true'
}

It will cause twice as many key lookups, of course.. But I don't see a way to do the whole thing inside of ConcurrentDictionary.

Try using a ConcurrentDictionary>.

When you create the Lazy, pass in a delegate that runs the scan on a site. The first time your Lazy.Value property is accessed, the scan will be run. Any subsequent callers will be blocked until the first scan finishes. Once the scan finishes, anyone who accesses Lazy.Value will be given the Value, but a second scan will never be run.

The concurrency of the ConcurrentDictionary is what makes this not work.

The only opportunity you really have to act on the value already in the dictionary is in the updateValueFactory, but that work will take place before the update actually happens and the value is set to true. During this period, another thread may also attempt to AddOrUpdate, in which case it will still see the old value of false, and start the update logic again.

Here's a sample program to demonstrate this:

using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;

namespace ConcurrentDictionaryCancelTest {
    class Program {
        static void Main( string[] args ) {
            var example = new ConcurrentDictionary<string, bool>();

            for( var i = 0; i < 3; i++ ) {
                example.AddOrUpdate( i.ToString(), false, ( key, oldValue ) => false );
            }

            Parallel.For( 0, 8, x => {
                example.AddOrUpdate(
                    ( x % 3 ).ToString(),
                    ( key ) => {
                        Console.WriteLine( "addValueFactory called for " + key );
                        return true;
                    },
                    ( key, oldValue ) => {
                        Console.WriteLine( "updateValueFactory called for " + key );
                        if( !oldValue ) {
                            var guid = Guid.NewGuid();
                            Console.WriteLine( 
                                key + " is calling UpdateLogic: " + guid.ToString() 
                            );
                            UpdateLogic( key, guid );
                        }
                        return true;
                    }
                );
            } );
        }

        public static void UpdateLogic( string key, Guid guid ) {
            Console.WriteLine( 
                "UpdateLogic has been called for " + key + ": " + guid.ToString()
            );
        }
    }
}

And some sample output:

updateValueFactory called for 0
updateValueFactory called for 1
updateValueFactory called for 2
updateValueFactory called for 0
updateValueFactory called for 1
0 is calling UpdateLogic: cdd1b1dd-9d96-417d-aee7-4c4aec7fafbf
1 is calling UpdateLogic: 161c5f35-a2d7-44bf-b881-e56ac713b340
UpdateLogic has been called for 0: cdd1b1dd-9d96-417d-aee7-4c4aec7fafbf
updateValueFactory called for 1
1 is calling UpdateLogic: 6a032c22-e8d4-4016-a212-b09e41bf4d68
UpdateLogic has been called for 1: 6a032c22-e8d4-4016-a212-b09e41bf4d68
updateValueFactory called for 0
updateValueFactory called for 2
2 is calling UpdateLogic: 76c13581-cd55-4c88-961c-12c6d277ff00
UpdateLogic has been called for 2: 76c13581-cd55-4c88-961c-12c6d277ff00
1 is calling UpdateLogic: d71494b6-265f-4ec8-b077-af5670c02390
UpdateLogic has been called for 1: d71494b6-265f-4ec8-b077-af5670c02390
UpdateLogic has been called for 1: 161c5f35-a2d7-44bf-b881-e56ac713b340
updateValueFactory called for 1
updateValueFactory called for 1
0 is calling UpdateLogic: f6aa3460-444b-41eb-afc6-3d6afa2f6512
UpdateLogic has been called for 0: f6aa3460-444b-41eb-afc6-3d6afa2f6512
2 is calling UpdateLogic: d911dbd1-7150-4823-937a-26abb446c669
UpdateLogic has been called for 2: d911dbd1-7150-4823-937a-26abb446c669
updateValueFactory called for 0
updateValueFactory called for 2

Note the delay between the first time updateValueFactory is called for 0, when UpdateLogic is going to be called, and then when it actually executes. During this time, i.e. before the value is updated to true, updateValueFactory is called for 0 again, and this results in the UpdateLogic being run for 0 again as well.

You need some kind of lock to make sure that reading the value, calling the update logic, and setting the new value is all one atomic operation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top