How should nodes in a chain be connected?

https://softwareengineering.stackexchange.com/questions/270255

06-10-2020
|

Вопрос

Suppose you were building a system that receives data on one end and sends filtered data on the other end.

The system is a chain of nodes, each receiving data from the node before and sending filtered data to the next one.

You have two approaches to choose from regarding the way the nodes would be connected:

A- Each node holds a reference to the next one through a particular interface.

B- Each node exposes an event, and the next node in line would be registered to that event, notified when that event is fired (when the previous node in line sends data).

In both cases, an outside entity connects the nodes to one another; they have no direct knowledge of each other.

Currently, I think events may be better because this way, a node doesn't care what the next node does; only what it has to send it. Thus enabling us to insert nodes in the middle easily.

There is a possibility nodes would need to be inserted in the middle, in the future.

Taking that into account, which approach would you prefer, and why? What are the pros and cons of each approach?

Решение

It depends on the layering of the design. If "nodes" are all in the same layer conceptually, then I would prefer references over events, especially read-only references. References are simpler and easier to trace with static analysis tools:

References can be read-only fields, whereas event fields are always mutable.
A reference field refers to a single object instance, whereas event fields contain a linked list of delegates.
A reference field is strongly typed. An event field is weakly typed, as any method can be hooked to the event. Stronger typing can reduce errors, as they occur at compile time, and makes it easier to browse or analyze the compiler output using analysis tools.

If some nodes are in a lower-level layer, I would use events for communication from the lower to the upper layer, in order to avoid references from a lower to a higher layer.

Your question is vague, and there is not enough information to make specific recommendations.

Другие советы

Identifying the pattern.

This pattern is actually pretty smart. In fact it's so useful that Microsoft has written a considerable and well liked library called Reactive Extensions to solve this using a technique called functional reactive programming. The pattern you're trying to name is called an Observable.

It generalizes an IEnumerable fundamentally. You have a collection of items you'd like to steam through a series of transformations like mapping, filtering aggregating and so on. However - you also have a Task fundamentally since you have a time dependent action which is not immediately available.

Basically we have a chart that goes something like this:

          |   Singular    | Plural
------------------------------------------------
Spatial   |   Value           | Enumerable<Value>
Temporal  |   Task<Value>     | Observable<Value>

Conceptually an observable is what models your problem. There has been a lot of research into this and you should really avoid inventing the wheel if possible. The flow of using an observable is something like this:

The workflow when using an observable

I link to actual examples later.

You start with a source - for example a timer, or a stream of data packets, a function that makes a request every few seconds or back-to-back and so on.
Then, transformations are applied at each turn - you can take an existing observable and .Select it into a new mapped observable or .Where it into a filter. You can also debounce it, flatMap it and perform interesting transformations. This is like LINQ only LINQ represents a single deferred one time query where an Observable represents an ongoing process.
You can unwrap the observable by subscribing to it, executing a function on its end.
You can fork an observable creating not only "lists of nodes" but whole trees.

This concept of a functional observable that chains is not only more powerful than events and aggregates well (You can merge observables, test them easily and so on) - it's also very simple. Your code can look something like:

GetDataItems() // returns a new observable
.Where(IWantItemsForTheNextRound) // filtering logic
.Select(Mapper) // mapping logic
... // any more transformations

Just like working with IQueryable in LINQ the old transformations don't care and are not aware of what's passed from then on. Like LINQ you get a strong abstraction over processing a collection - only this time there's time involved.

You can insert a node in the middle by putting a hook for it and injecting a function that does something with that hook - if you flat map on that you can put arbitrarily complicated logic there extending it as much as you want.

Actual theory and reading.

There is plenty of theoretical material about why this approach is very robust in KrisKowal's gtor or Erik Meijer's Duality and the end of reactive.

Here are Some Rx videos on channel 9.
Here is the Home Page
Here are some samples.

In a single application references are easier. Above all, they will allow you to process data asynchronously.

If your nodes can be located in different applications (even in different computers) you'll need an event-based pattern like Message Queue. This pattern can be implemented with help of .NET events in each single application.

So, if you don't plan to scale your application, it's easier to use references instead of events.

This is a tricky question :) If I really have to choose from only this two options, I would choose A for the simpler form of your task and B for the extended. That is because events does not really make sense until you are calling exactly one function.

The primary responsibility of your nodes is to filter the data that they receive. Still when you change the structure of your graph, you also have to change your node class, although the change has nothing to do with its primary function.

The way around this is to conform to the Single Responsibility Principle (SRP). Make the nodes sole responsibility the filtering, and create a class what transfers the data between nodes.

So here is option C:

interface InputNode
{
  DataBuffer getData();
}

interface OutputNode
{
  void setData(DataBuffer data);
}

class FilterNode : InputNode, OutputNode
{
  ...
}

Implement this node in a manner that its sole responsibility is to do the filtering. Make it absolutly unaware of that other nodes exist at all! And this is the interface of the class that transfers the data between nodes:

interface DataTransfer
{
  InputNode {get; set;}
  OututNode {get; set;}
  void transfer();
}

And thats how your graph should look like:

class Graph
{
  private List<DataTransfer> transfers;

  public Graph()
  {
    //initialize nodes and transfers
    ...
  }

  public void setData(DataBuffer data)
  {
    transfers[0].setData(data);
  }

  public void process()
  {
    foreach(var transfer in transfers)
      transfer.transfer();
  }

  public DataBuffer getData()
  {
    return transfers.Last().getData();
  }
}

If you want to extend this solution as you described, you will need to change the DataTransfer interface to have a list of output nodes, and you will need to change the graph representation so that it can do the branching.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с softwareengineering.stackexchange