How to apply SRP in a loop where two things need to be done?

https://softwareengineering.stackexchange.com/questions/354865

17-01-2021
|

Pergunta

Is there a way to apply the Single Responsibility Principle to a function where two things need to occur in a loop in order to not need to iterate twice?

For example, suppose I have a function like the following (in pseudo-Python):

def process_stuff(list):
    for item in list:
        process_one_way(item)
        process_another_way(item)

In this function, the data needs to be processed twice in two completely separate ways (in particularly, an example I recently worked with involved both creating some objects from the data in the list, and then also generating separate JSON for a different purpose). I feel like this violates the SRP because this function is doing two completely separate things with the data. A natural way to separate them would be:

def process_stuff(list):
    process_one_way(list)
    process_another_way(list)

def process_one_way(list):
    for item in list:
        # process

def process_another_way(list):
    for item in list:
        # process

... but that involves iterating through the list twice, which seems wasteful. Is there a better way to separate these two tasks, or am I misunderstanding what the intention of the SRP is?

Solução

The SRP does not say a function should only do one thing. If that was the case you couldn't write an application or system which does more than one single thing!

What you should consider is: Will the two operations always be performed on the same collections at the same time? Then they belong in the same loop. Take this example:

def end_of_year_stuff(customers):
    for customer in customers:
        send_christmas_card(customer)
        calculate_for_financial_statement(customer)

Here it is kind of by accident these two things happen at the same time. The company might change its financial year without Christmas being moved, or Christmas might be called off without it affecting the financial year. Or you might decide to exclude customers in Agrabah from the Christmas card, but they still have to be considered for the financial reports. In short, the two concerns might change independently and therefore belong in separate loops.

Now take this example:

   def roll_logs(logfiles):
       for logfile in logfiles:
          compress_file(logfile)
          move_to_backup_drive(logfile)

In this case the two operations belong together in the same loop, since you want to make sure these two things happen together.

The important thing to realize is SRP depends on the meaning and purpose of the functions and operations involved. So by using "semantic-free" names like "process_stuff" and "process_one_way" in the example, it is impossible to say anything about whether the code violates SRP or not.

Outras dicas

I'd argue, that the code provided is very close to the SRP, way closer than most production code I've seen (sad, very sad ;) ). The one thing you are doing with the data is processing. There will always be methods that require more than one step and they'd still adhere to the SRP. At the other hand you could argue that your method indeed does two things: Processing the list and processing an item from the list.

You could - however - improve your solution. Just extract the part processing the item. This method does one thing: Processing the item. Disregarding if it was one step or two. And now your outer method really only does one thing: Processing the list.

def process_stuff(list):
    for item in list:
        process_item(item)

def process_item(item):
    process_one_way(item)
    process_another_way(item)

You already have the concerns separated in different functions. As you go higher up in the call stack you will find bigger functions/classes/components that represent operations on a higher abstraction level. The point is to hide implementation details while maintaining single concepts on each level. Not to end up with functions that have one line only.

What you would want to consider is if it would ever make sense to perform the one operation separate from the other way operation. If so, you should indeed split them. Iteration is not likely to be expensive compared to the repeated operations, for performance reasons it probably does not make sense to put them in a single loop.

This depends on how do you separate responsibilities.

If it is meaningful to have two separate loops which have other independent uses, create them, and combine them when needed:

def peel_potatoes(potatoes):
  for potato in potatoes:
    peel(potato)

def dice_vegetables(vegetables):
  for veg in vegetables:
    slice_and_dice(veg)

def prepare_potatoes(potatoes):
  peel_potatoes(potatoes)
  dice_vegetables(potatoes)

If you don't have the above separate cases, create one procedure that does the complete step, and then apply it in loop:

def dress_up(person):
  put_on_pants(person)
  put_on_shirt(person)

for person in people:
  dress_up(person)

A clean design is usually worth paying for by some small performance hit; with the loops like the above, it's not even the case.

(A more advanced case, when processing of a single item can run asynchronously, usually involves queues combined in a similar way.)

Being a single responsibility is based on perspective and scope. We're not going to break everything down to the flipping of a particular bit.

In your example, your loop could be a single transaction, that requires to separate functions to be performed in a particular order and possibly have a need to roll all of them back if the entire transaction cannot execute properly for a single item or the entire list.

A transfer needs to be treated like a single action from the bank's perspective. If there isn't enough money in Account A, you can't put money in Account B. Each account sees the result individually.

Banks need to process interest payments on all accounts, so they can make sure they have enough funds to cover all of them. Again, from the bank's perspective, this is a single process, but each account has it's own deposit. Someone is going to hit a button and make it all happen.

There may be a simpler way to think about it.

def mapc(lis, fun)
   for item in lis
      fun(item)

At that point, you can define a particular item processor, say, foo(), and then

def foo(item)
   do_one_thing(item)
   do_other_thing(item)

mapc(my_table, foo)

Note 1: The name "mapc" comes from LISP in ancient times. It is chosen for its mnemonic value.

Note 2: Expect to have to explain this to some of your coworkers. They haven't read Abelson & Sussman's "Structure and Interpretation of Computer Programs", and the idea of separating the loop from the processing done in the loop may be utterly alien to them.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange