Question

I was working on a bit of code for a personal project, when I came upon the need to generate checksums on large amounts of files. First off let me say I already solved this problem ideally using System.Threading.Tasks.Parallel (.net net, C#), which behaves have I would expect. What I expected was several checksums running simultaneously using Tasks, given a list of tasks, but not necessarily have them be processed in order. In other words, if I put a small one (10mb perhaps) as the last one, and a 5gb file as the first, the last one should finish first. Because it takes significantly less time to process.

Here is a very simple example:

static async void MainAsync()
{
    await GetChecksum(1,@"E:\Files\ISO\5gbfile.iso");
    await GetChecksum(2,@"E:\Files\ISO\4gbfile.iso");
    await GetChecksum(3,@"E:\Files\ISO\3gbfile.iso");
    await GetChecksum(4,@"E:\Files\ISO\10mbfile.iso");
}

And the GetCheckSum function:

static async Task<string> GetChecksum(int index,string file)
{
    using (FileStream stream = File.OpenRead(file))
    {
        SHA256Managed sha = new SHA256Managed();
        Task<byte[]> checksum = sha.ComputeHashAsync(stream, 1200000);
        var ret = await checksum;
        System.Console.WriteLine($"{index} -> {file}");
        var hash = BitConverter.ToString(ret).Replace("-", String.Empty);
        System.Console.WriteLine($" ::{hash}");
        return hash;
    }
}

According to this article: https://msdn.microsoft.com/en-us/library/hh696703.aspx

Which states:

The method creates and starts three tasks of type Task, where TResult is an integer. As each task finishes, DisplayResults displays the task's URL and the length of the downloaded contents. Because the tasks are running asynchronously, the order in which the results appear might differ from the order in which they were declared.

However that is not what I experience with this example. I see each one finishing in the order they were called. I realize in this example its not using parallel processing, which I assume would force this to use a single processor, but given that the last one takes 2 seconds to process and the first one takes 2 minutes, I would still expect that the smallest one should finish first.

Can somebody possibly explain this behavior? I just want to understand whats going on behind the scenes with async and await when used like this.

Was it helpful?

Solution

When you call it like this:

await GetChecksum(1,@"E:\Files\ISO\5gbfile.iso");
await GetChecksum(2,@"E:\Files\ISO\4gbfile.iso");
await GetChecksum(3,@"E:\Files\ISO\3gbfile.iso");
await GetChecksum(4,@"E:\Files\ISO\10mbfile.iso");

It creates the first task, then waits for it to complete, then creates the second task, then waits for it to complete, etc.

When you call it this way:

Task<string> task1 = GetChecksum(1,@"E:\Files\ISO\5gbfile.iso");
Task<string> task2 = GetChecksum(2,@"E:\Files\ISO\4gbfile.iso");
Task<string> task3 = GetChecksum(3,@"E:\Files\ISO\3gbfile.iso");
Task<string> task4 = GetChecksum(4,@"E:\Files\ISO\10mbfile.iso");

string checksum1 = await task1;
string checksum2 = await task2;
string checksum3 = await task3;
string checksum4 = await task4;

It creates all the tasks and starts them running in parallel, then waits for the first one to complete, then waits for the second one to complete, etc. Syntax matters. It stops executing statements at the point you call await until the current task finishes.

Licensed under: CC-BY-SA with attribution
scroll top