문제

I wish to download a large number of files with httpclient, performs some time consuming but not expensive computation on them, and then add the result to my database after running some query that shows that it is not already there.

How can I do this conceptually (just the locations of the awaits and the like would be helpful)

I currently have the following:

get list of addresses add (await the web page download, then continue processing) to a list of Task foreach element of the list, await on it, and then add it to the database.

However, it seems that this is essentially running it serially.

How should this be designed?

도움이 되었습니까?

해결책

I would set up a pipeline using TPL Dataflow. You post the addresses and the actors are:

  1. Web page download
  2. Processing
  3. Add to DB

Use async wherever you can (as long as the operation is truly asynchronous) and set a high MaxDegreeOfParallelism to allow TPL to choose the optimal value by itself.

다른 팁

I would get the downloads/processing running in parallel and await them all to complete. The code would look something like this:

// get a collection of "hot" Tasks running in parallel
var tasks = myCollection.Select(x => DownloadAndProcessAsync(x));

// await the completion of all Tasks
await Task.WhenAll(tasks);
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top