TPL Dataflow, alternative to JoinBlock limitations?

Question 1

One way to do this is to use BatchBlock with Greedy set to false. In this configuration, the block doesn't do anything until there are n items from n different blocks waiting for it to be consumed (where n is the number you set when creating the BatchBlock). When that happens, it consumes all n items at once and produces an array containing all of the items.

One caveat with this solution is that the resulting array is not sorted: you're not going to know which item came from which source. And I have no idea how does its performance compare with JoinBlock, you'll have to test that by yourself. (Though I would understand if using BatchBlock this way was slower, because of the overhead necessary for non-greedy consumption.)

Question 2

If you want to perform multiple parallel operations for each item, it makes more sense IMHO to perform these operations inside a single block, instead of splitting them to multiple blocks and then trying to join the independent results into a single object again. So my suggestion is to do something like this:

var block = new TransformBlock<MyClass, MyClass>(async item =>
{
    Task<SomeType1> task1 = Task.Run(() => CalculateProperty1(item.Id));
    Task<SomeType2> task2 = Task.Run(() => CalculateProperty2(item.Id));
    await Task.WhenAll(task1, task2).ConfigureAwait(false);
    item.Property1 = task1.Result;
    item.Property2 = task2.Result;
    return item;
}, new ExecutionDataflowBlockOptions()
{
    MaxDegreeOfParallelism = 2
});

In the above example items of type MyClass are passed through a TransformBlock. The properties Property1 and Property2 of each item are calculated in parallel using a separate Task for each property. Then both tasks are awaited, and when both are complete the results are assigned to the properties of the item. Finally the processed item is returned.

The only thing you want to be aware with this approach is that the degree of parallelism will be the product of the internal parallel operations and the MaxDegreeOfParallelism option of the block. So in the above example the degree of parallelism will be 2 x 2 = 4. To be precise this will be the maximum degree of parallelism, because it is possible that one of the two internal calculations will be slower than the other. So at any given moment the actual degree of parallelism could be anything between 2 and 4.