Question

I have a few different ways of upload entire directories to Amazon S3 within my application depending on what options are selected. Currently one of the options will perform an upload of multiple directories in parallel. I'm not sure if this is a good idea as in some cases it sped up the upload and other cases it slowed it down. The speed up appears to be when there are a bunch of small directories, but it slows down if there are large directories in the batch. I'm using the parallel ForEach loop seen below and utilizing the AWS API's TransferUtility.UploadDirectoryAsync() method as such:

Parallel.ForEach(dirs,myParallelOptions, 
                   async dir => { await MyUploadMethodAsync(dir) };

Where the TransferUtility.UploadDirectoryAsync() method is within MyUploadMethodAsync(). The TransferUtility's upload methods all perform parallel uploads of parts a single file (if the size is big enough to do so), so performing a parallel upload of the directory as well may be overkill. Obviously we are still limited to the amount of bandwidth available so this might be a waste and I just should just use a regular foreach loop with the UploadDirectoryAsync() method. Can anyone provide some insight on if this is bad case for parallelization?

Was it helpful?

Solution

Did you actually test this? The way you're using it, Parallel.ForEach may return well before any of MyUploadMethodAsync is completed, because of the async lambda:

Parallel.ForEach(dirs,myParallelOptions, 
    async dir => { await MyUploadMethodAsync(dir) };

Parallel.ForEach is suited for CPU-bound tasks. For IO-bound tasks, you are probably looking for something like this:

var tasks = dirs.Select(dir => MyUploadMethodAsync(dir));
await Task.WhenAll(tasks);
// or Task.WaitAll(tasks) if you need a blocking wait
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top