Question

I am creating a java package which offers an API based on a pipeline pattern. That is I have a series of steps which can be plugged together in any combination provided their inputs match the Output of the step before. Initial source and final sink are fixed.

Unfortunately I have a step which feeds its output into two later steps. That is this step n outputs a pair of values, one is used by step n+1 and the other by step n+m. Step n+m however requires a pair of values, where the first comes from step n+m-1 and the second from step n.

In pseudocode this looks like:

class Pipeline {
     Stuff runPipeline(Stuff input, List<Step> steps) {
         assert the output of every step matches the input of the one after it

         for(Step current : steps) {
             input = current.execute(input);
         }

         return input;
     }

The problematic steps look like:

class StepN implements Step {
     Pair<Stuff,Stuff> execute(Stuff input) {
         compute two somethings
         //something1 goes to step n+1, something2 goes to step n+m
         return new Pair(something1,something2)
     }
 }    
 class StepNplusM implements Step {
     Stuff execute(Pair<Stuff,Stuff> input) {
         compute something
         //something1 comes from step n+m-1, something2 comes from step n
         return the computed something
     }
 }

Currently I am using a static variable for something2. Obviously this is a very poor solution. However, I am unsure how to do it properly. It is quite possible, that we will get more such steps in the future. :(

I have considered splitting StepN up, so that I have one step for something 2 which can then be run when it is needed. Unfortunately these two values are retrieved from an external service outside of our control. Computing them is costly for this external service, so I do not want to do it twice.

I have also considered creating some kind of data store which every step can write and retrieve from. However this breaks the pattern and introduces global state.

I also considered changing every step between step n and step n+m, so that they accept Pairs as well (via overloading) and just forward something2. But changing basically every step everytime some of them needs an additional input doesn't really cut it either.

The last thing I considered is to allow multiple paths through the pipeline. This is less bad than the other ideas, but it introduces a lot of additional complexity which isn't needed for anything else.

Was it helpful?

Solution

I can think of two solutions, each of which has its drawbacks.

Firstly, you can pass around a big Context object that stores all the required state (i.e. it contains the current Stuff plus something2). This is the accepted solution for this related question. That answer suggests making each stage declare its input requirements by accepting an argument which is interface with just what it needs. The Context class implements all the required interfaces to provide encapsulation. The downside (in my experience) is that it can be hard to know when to make a new interface or when to add to an existing one, and the Context always ends up being huge.

Secondly, you can model your stages as functions where each function again declares its input requirements by the type of its argument. You can then only chain the functions together if the requirements have been met (e.g. the compiler won't let you put StepNplusM before StepN). This is the service/filter composition approach taken by Twitter's Finagle library. The downside is that you'll need to change the signature of intervening steps to pass through data that they don't care about. However, if you're willing to replace the for-loop with code that calls each step explicitly, you could do something similar to the accepted answer on this question.

Licensed under: CC-BY-SA with attribution
scroll top