Pergunta

I wish to download a large number of files with httpclient, performs some time consuming but not expensive computation on them, and then add the result to my database after running some query that shows that it is not already there.

How can I do this conceptually (just the locations of the awaits and the like would be helpful)

I currently have the following:

get list of addresses add (await the web page download, then continue processing) to a list of Task foreach element of the list, await on it, and then add it to the database.

However, it seems that this is essentially running it serially.

How should this be designed?

Foi útil?

Solução

I would set up a pipeline using TPL Dataflow. You post the addresses and the actors are:

  1. Web page download
  2. Processing
  3. Add to DB

Use async wherever you can (as long as the operation is truly asynchronous) and set a high MaxDegreeOfParallelism to allow TPL to choose the optimal value by itself.

Outras dicas

I would get the downloads/processing running in parallel and await them all to complete. The code would look something like this:

// get a collection of "hot" Tasks running in parallel
var tasks = myCollection.Select(x => DownloadAndProcessAsync(x));

// await the completion of all Tasks
await Task.WhenAll(tasks);
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top