Pregunta

I'm playing around with async and await in C# in a simple little console application. My goal is simple: To process a list of files in asynchronous manner, so that the processing of one file does not block the processing of others. None of the files are dependent on one-another, and there are (let's say) thousands of files to go through.

Here's is the code I have currently.

public class MyClass
{
    public void Go()
    {
        string[] fileSystemEntries = Directory.GetFileSystemEntries(@"Path\To\Files");

        Console.WriteLine("Starting to read from files!");
        foreach (var filePath in fileSystemEntries.OrderBy(s => s))
        {
            Task task = new Task(() => DoStuff(filePath));
            task.Start();
            task.Wait();
        }
    }

    private async void DoStuff(string filePath)
    {
        await Task.Run(() =>
        {
            Thread.Sleep(1000);
            string fileName = Path.GetFileName(filePath);
            string firstLineOfFile = File.ReadLines(filePath).First();
            Console.WriteLine("{0}: {1}", fileName, firstLineOfFile);
        });
    }
}

And my Main() method simply invokes this class:

public static class Program
{
    public static void Main()
    {
        var myClass = new MyClass();
        myClass.Go();
    }
}

There's some piece to this asynchronous programming patten that I seem to be missing, though, since whenever I run the program, it seems random how many files are actually processed, anywhere from none of them to all six of them (in my example file set).

Basically, the main thread isn't waiting for all of the files to be processed, which I suppose is part of the point of asynchronously-running things, but I don't quite want that. All I want is: Process as many of these files in as many threads as you can, but still wait for them all to complete processing before finishing up.

¿Fue útil?

Solución 2

I combined the comments from above in order to reach my solution. Indeed, I didn't need to use the async or await keywords at all. I merely had to create a list of tasks, start them all, then call WaitAll. Nothing need be decorated with the async or await keywords. Here is the resulting code:

public class MyClass
{
    private int filesRead = 0;

    public void Go()
    {
        string[] fileSystemEntries = Directory.GetFileSystemEntries(@"Path\To\Files");

        Console.WriteLine("Starting to read from files! Count: {0}", fileSystemEntries.Length);
        List<Task> tasks = new List<Task>();
        foreach (var filePath in fileSystemEntries.OrderBy(s => s))
        {
            Task task = Task.Run(() => DoStuff(filePath));
            tasks.Add(task);
        }
        Task.WaitAll(tasks.ToArray());
        Console.WriteLine("Finish! Read {0} file(s).", filesRead);
    }

    private void DoStuff(string filePath)
    {
        string fileName = Path.GetFileName(filePath);
        string firstLineOfFile = File.ReadLines(filePath).First();
        Console.WriteLine("[{0}] {1}: {2}", Thread.CurrentThread.ManagedThreadId, fileName, firstLineOfFile);
        filesRead++;
    }
}

When testing, I added Thread.Sleep calls, as well as busy loops to peg the CPUs on my machine. Opening Task Manager, I observed all of the cores being pegged during the busy loops, and every time I run the program, the files are run in an inconsistent order (a good thing, since that shows that the only bottleneck is the number of available threads).

Every time I run the program, fileSystemEntries.Length always matched filesRead.

EDIT: Based on the comment discussion above, I've found a cleaner (and, based on the linked question in the comments, more efficient) solution is to use Parallel.ForEach:

public class MyClass
{
    private int filesRead;

    public void Go()
    {
        string[] fileSystemEntries = Directory.GetFileSystemEntries(@"Path\To\Files");

        Console.WriteLine("Starting to read from files! Count: {0}", fileSystemEntries.Length);
        Parallel.ForEach(fileSystemEntries, DoStuff);
        Console.WriteLine("Finish! Read {0} file(s).", filesRead);
    }

    private void DoStuff(string filePath)
    {
        string fileName = Path.GetFileName(filePath);
        string firstLineOfFile = File.ReadLines(filePath).First();
        Console.WriteLine("[{0}] {1}: {2}", Thread.CurrentThread.ManagedThreadId, fileName, firstLineOfFile);
        filesRead++;
    }
}

It seems there are many ways to approach asynchronous programming in C# now. Between Parallel and Task and async/await, there's a lot to choose from. Based upon this thread, it looks like the best solution for me is Parallel, as it provides the cleanest solution, is more efficient than manually creating Task objects myself, and does not clutter the code with async and await keywords while acheiving similar results.

Otros consejos

One of the major design goals behind async/await was to facilitate the use of naturally asynchronous I/O APIs. In this light, your code might be rewritten like this (untested):

public class MyClass
{
    private int filesRead = 0;

    public void Go()
    {
        GoAsync().Wait();
    }

    private async Task GoAsync()
    {
        string[] fileSystemEntries = Directory.GetFileSystemEntries(@"Path\To\Files");

        Console.WriteLine("Starting to read from files! Count: {0}", fileSystemEntries.Length);

        var tasks = fileSystemEntries.OrderBy(s => s).Select(
            fileName => DoStuffAsync(fileName));
        await Task.WhenAll(tasks.ToArray());

        Console.WriteLine("Finish! Read {0} file(s).", filesRead);
    }

    private async Task DoStuffAsync(string filePath)
    {
        string fileName = Path.GetFileName(filePath);
        using (var reader = new StreamReader(filePath))
        {
            string firstLineOfFile = 
                await reader.ReadLineAsync().ConfigureAwait(false);
            Console.WriteLine("[{0}] {1}: {2}", Thread.CurrentThread.ManagedThreadId, fileName, firstLineOfFile);
            Interlocked.Increment(ref filesRead);
        }
    }
}

Note, it doesn't spawn any new threads explicitly, but that may be happening behind the scene with await reader.ReadLineAsync().ConfigureAwait(false).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top