Data handling in SRP (single responsibility principle)

https://softwareengineering.stackexchange.com/questions/414770

14-03-2021
|

Pergunta

TLDR; Robert C. Martin adviced here: http://blog.cleancoder.com/uncle-bob/2014/05/08/SingleReponsibilityPrinciple.html to split the multi responsibility classes into "single reason to change"-classes. It was not explained, how properties(data), used by more than one of the splitted parts, should be organized/stored/passed on. I tried to explain the matter by an example and discussed possible solutions.

Accepted Answer: Doc Brown pointed out, my approach was similiar to the "pipes & filters" architecture. He suggested a functional approach where each Single Responsibility class gets the data, it operates on, handed as parameter. This is organized by overseeing methods which handle each a business case. Read and upvote his great answer below for details.

Original question

I have read a lot about separating methods and function into "single reason to change" classes. But when it comes to data, I find very little details. I am not sure where to put the data, each separated part needs as Input and provides as Output

An example scenario:

I got a data source, a list of some kind, maybe in a csv or excel format. Requirements could be:

import the file (Import)
check the list for certain conditions (business logic, validation)
calculate stuff based on a few fields (business logic)
convert the list to a different format, e.g. json (business logic, conversion)
write the changed list to the excel file (Output)
write the changed list to the json (Output)

I would divide the responsibility into following parts(Lets call them "Helpers"):

Import/Export of the List -> ListSerializer (produces List from disk)
Check the List ->ListChecker (takes List, produces ErrorList)
Apply business logic to the List->ListProcessor (takes List and ErrorList, produces List (with changes))
Convert the List to JSONList ->ListToJSONListConverter (takes List(with changes), produces JSONList)
Export the JSONList to file ->JSONListSerializer (takes JSONList, writes to disk)

AFAIK a classical SRP design. changes to the list format, check logic, business logic or JSONformat requiring to change very few classes

Now I want to implement it. All classes get Interfaces for exposing their function. An Orchestrating Class (lets call it "Superior") is created to handle user input/other triggers and calling the sub routines This class needs to know every interface of the "Helpers"(so it can call the methods) and must be initialized with concrete objects (implementations of the interfaces) by the init logic lets call it "Boss". Basic Dependency Injection stuff.

Basically the Boss creates the Helpers and the Superior and tells the Superior these are your Helpers, Jack ListSerializer as your ListSerializer, Amy ListChecker as ListChecker, ... When a new Task arrives, the Superior gets notified and he/she splits and delegates the task to the Helpers. Lets take that picture a little further:

Now there are multiple variants how things could be done:

1st: Orchestrating class manages all data storing Superior: Mr. ListSerializer please load the list. Jack the ListSerializer: Superior, I have completed the task, here is the List. Superior: Thanks. Mrs. ListChecker, please check this list, it is right here. ...

2nd: Every dataproducer stores its output for itself, the others know how to retrieve it Superior: Mr. ListSerializer please load the list. Jack the ListSerializer: Superior, I have completed the task Superior: Thanks. Mrs. ListChecker, please check the list(, you can find it at the ListSerializers desk.) ..

3rd: One big variable data storage, where the helpers know how to access the info relevant to them Superior: Mr. ListSerializer please load the list. Here is the magic drawer to put your results in. Jack the ListSerializer: Superior, I have completed the task Superior: Thanks. Mrs. ListChecker, please check the list, you can find it in the magic drawer, please put your results in there, too. ...

every variant has its advantages, but is bad to some degree

1st: The Superior needs to know the data objects up front, one could reason to use interfaces for that. feels little object oriented to store data centrally, but one could try to live with it

2nd: Feels more object oriented, the Superior doesn't need to know the details, downside is the additional coupling between the Helpers they need to know each other

3rd: most decoupled, no Helper needs to know any other, only the way how to retrieve and store data from and to the all knowing data storage object

Does the SRP (or SOLID) provide a guideline for that? What are best practices in that matter?

Edit

I refer to http://blog.cleancoder.com/uncle-bob/2014/05/08/SingleReponsibilityPrinciple.html who, as I understand, propagates such a splitting in his example. But the example is not covering how the classes interact, how the data or object is passed on.

Edit 2 Added TLDR; and accepted answer

Solução

Your approach is fine, it is a variant of the well-known "pipes & filters" architecture.

I suggest your "orchestrating class" should have one core method which looks like this (in C#):

var originalList = new ListDeserializer(fileName).ReadList();
var errorList = new ListChecker().Check(originalList);
var improvedList = new ListProcessor().Process(originalList,errorList);
var jsonList = new ListToJSONListConverter().Convert(improvedList);
new JSONListSerializer().Serialize(jsonList);

I guess that is what you have in mind with your approach #1. It has the advantage that you can unit test each step of the pipeline (I would not call them "Helpers") in isolation. There is no global storage with undesirable side effects (like in your approach #3), and no coupling between the processing steps which could prevent unit testing (like in your approach #2). And this is definitely not "procedural design" (#3 would probably deserve that name) - quite the opposite, it is a functional approach.

Note also, when you will manage it to keep any real business logic out of the "orchestrating class", there is probably no need to unit test that class in isolation, and to introduce interfaces to mock out all the processing steps. I would probably only implement an integration test for that class and only "mock out" the IO, if the real IO is disturbing that test.

Yes, there is a clear distinction here between "processing" objects like a "ListProcessor" and "data objects" like the "improvedList" or its items. You wrote this does not look very OO to you, but in my experience, starting with "anemic data objects" is fine. When the program grows, one will usually find logic in the code which can be refactored into methods of those list items, because they fit naturally there. Then they may be used inside a method like ListProcessor.Process(), or reused elsewhere. And that will transform your "data objects" into real "business objects" over time.

Note that one has to work here on two different levels of abstraction: at the lower level, you create small "business objects" around your data, with their own methods which operate mostly on that data. On the higher level, you make a separation between "processing functions (=functor objects)" and the data processed by the former. Maybe that's not puristic OO anymore, but it is definitely not worse. To my experience, this kind of design is better to handle than an approach which mixes high-level functions into low-level business objects just to make them fit to some puristic, religious understanding about how OO design must look like.

Outras dicas

What you are doing (many other people as well) is called procedural design. The thing that we were supposed to drop in favor of object-orientation a couple of decades ago. Because of this different context, most of the words and design elements you are using are completely off.

Just to be clear, I don't think it's your fault. Most of what we find online still promote procedural thinking. We just don't have enough good material, that includes some of the authors who came up with this stuff unfortunately.

Back to your question: There are no data objects. In fact "data objects" is an oxymoron. Objects are supposed to include the behavior applicable to the data they contain, and the data should not be visible at all. And that is only the absolute bare minimum.

There are no "helpers" or "orchestration" / "boss" (i.e. god) objects either. The term "responsibility" is also used wrong. It doesn't mean a technical step that needs to be done. In the context of object-orientation it specifically refers to business responsibility, i.e. something that directly comes from the requirements and has at least some value on its own.

Your usage of "requirements" is also a bit off. What you listed are not requirements, but technical steps. A requirement would be like: I want to open the sales data file and generate the monthly report for marketing. (Or something like that). The cool thing is, you can (absolutely should) use that as the basis of your design.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange