Most Optimal TPL Dataflow Design?

https://stackoverflow.com/questions/11242144

17-06-2021
|

Pregunta

I like to ask for input how to best design the most optimal architecture using TPL Dataflow. I do not have written code yet so there is no sample code I can post. I am not looking for code (unless volunteered) either but assistance in the design would be much appreciated:

The requirements are as follows:

I have 3 core datablocks that are dependent on each other in specific ways. Datablock1 is a producer that produces objects of type Foo1. Datablock2 is supposed to subscribe to Foo1 objects (from Datablock1) and potentially (not upon each and every Foo1, subject to a specific function) produce Foo2 objects that it stores in an output queue for other datablocks to consume. Datablock3 also consumes Foo1 objects (from Datablock1) and potentially produces Foo3 objects that Datablock2 consumes and transforms into Foo2 objects.

In summary, here are the datablocks and what they each produce and consume:

Datablock1: Produces(Foo1), Consumes(Nothing)
Datablock2: Produces(Foo2), Consumes(Foo1, Foo3)
Datablock3: Produces(Foo3), Consumes(Foo1)

An additional requirement is that that the same Foo1 are processed at about the same time in Datablock2 and Datablock3. It would be ok if Foo1 objects are first consumed by Datablock2 and then once Datablock2 has done its work the very same Foo1 objects are posted to Datablock3 for it to do its work. Foo2 objects from Datablock2 can result from either operations on Foo1 objects or Foo3 objects.

I hope this makes sense, I am happy to explain more if it is still unclear.

My first idea was to create TPL Dataflowblocks for each of the 3 datablocks and to make them handle incoming streams of different object types. Another idea is to split up datablocks and have each datablock only handle streams of one single object type. What do you recommend or is there an even better solution that may work?

Svick has already helped on the Datablock1 and it is already operational, I am just stuck on how to go about transforming my current environment (as described above) to TPL Dataflow.

Any ideas or pointers are much appreciated.

Solución

Let's split this problem in three and solve each independently.

The first one is how to produce an item conditionally. I think the best option is to use TransformManyBlock and let your function return a collection with one or zero items.

Another option would be to link the two blocks conditionally, so that nulls are ignored and return null when you don't want to produce anything. But if you do that, you also have to link the source to a NullTarget, so that the nulls don't stay in its output buffer.

The second problem is how to send Foo1s to both block #2 and block #3. I can see two ways here:

Use BroadcastBlock linked to both target blocks (#2 and #3). Be careful with this, because BroadcastBlock doesn't have an output queue, so if a target block postpones an item, it means it won't process it. Because of this, you shouldn't set BoundedCapacity of blocks #2 and #3 in this case. If you don't do that, they will never postpone and all messages will be processed by both blocks.
After processing Foo1 by block #2, manually Post() (or better, SendAsync()) it to block #3.

I'm not sure what exactly does “at about the same time” mean, but in general, TPL Dataflow doesn't make any guarantees about the order of processing of independent blocks. You can alter priority of different blocks by using a custom TaskScheduler, but I'm not sure that would be useful here.

The last and most complicated problem is how to process items of different types in a single block. There are several ways how to do this, though I'm not sure which will be best for you:

Don't process them in single block. Have one TransformBlock<Foo1, Foo2> and one TransformBlock<Foo3, Foo2>. You can then link them both to a single BufferBlock<Foo2>.
As you suggested, use BatchedJoinBlock<Foo1, Foo3>, with batchSize of 1. This means the resulting Tuple<IList<Foo1>, IList<Foo3>> will always contain either one Foo1 or one Foo3.
Enhance the previous solution by linking the BatchedJoinBlock to a TransformBlock that produces a more suitable type. That could be either Tuple<Foo1, Foo3> (one of the items will be always null), or something like the F# Choice<Foo1, Foo3>, which ensures that only one of the two is set.
Create a new block type from scratch, that does exactly what you want. It should be ISourceBlock<Foo2> and also have two properties: Target1 of type ITarget<Foo1> and Target2 of type ITarget<Foo3>, like the built-in join blocks.

With options #1 and #3, you could also encapsulate the blocks into a single custom block, that looks like the block from #4 from the outside, so that it's more easily reusable.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow