Software-design for algorithm engineering

https://softwareengineering.stackexchange.com/questions/423588

24-03-2021
|

Pergunta

I'm currently working on an program that solves a graph optimization problem.
I know the "standard" software-design principles like information hiding, modularization, etc. What I'm struggling with here is that I feel like this use case rules out most of them as not applicable.

For example, I constantly have to use the same ~5 lists, stacks and sets for the algorithm, so it doesn't make sense to pass them as an argument to every function redundantly. My current solution is that I just keep those in global scope.

Also what isn't clear to me is how to use OOP in this case. I abstracted some datastructures I had to come up with into their own class, but many other things don't appear to make a useful class.

My questions are the following:

Does this mean the OOP paradigm is not a good fit for this category of problems? Or am I simply not experienced enough with OOP to utilize for this domain?
Is their a different way of structuring such programs which will fit better?
is there a name for such programs which basically just implement one algorithm? Maybe "Optimization programming"?

Solução

Does this mean the OOP paradigm is not a good fit for this category of problems? Or am I simply not experienced enough with OOP to utilize for this domain?

You can use the OOP paradigm to encapsulate the algorithmic logic, i.e. define a single class that "wraps" the entire algorithm. Those global lists can be set up as member variables within the class, to allow the free access you are looking for while avoiding pollution of the global namespace. If there are subroutines or repeated actions within the algorithm, those can be implemented as private methods. The point is to keep the messiness of the algorithm private so that others can use your class without worrying about its internal details, and so that you have a modular unit of code that can be tested and versioned independently. You also have a unit of storage, meaning that you can create multiple instances, serialize, etc. if you wanted to run on a distributed, parallel, or fault-tolerant system.

Is their a different way of structuring such programs which will fit better?

From the outside of the class, the algorithm's details should be kept private. From the inside, it is best to structure the logic as closely as possible to the specification itself, so that it is easy to review the code for correctness by comparing each function to its corresponding specification. You should usually only expose one or two public entry points.

is there a name for such programs which basically just implement one algorithm? Maybe "Optimization programming"?

It's called "implementing an algorithm."

Outras dicas

Depending on your long term needs, keeping those data structures in global variables might be just right if you're using the program in isolation. As soon as you want to use the algorithm as part of another program or want to make it available to other users as part of a library, put the various data structures into instance variables of the class that embodies the algorithm. Objects in OOP don't have to be representations of real life entities, they might just as well represent algorithms or even parts of the data structures needed to implement those algorithms.

Let me guess: this is a program

only for academic purposes, which will probably have a restricted life time
in the range of 2K lines of code up to 20K at most
with just one developer (you)?

Then using the restricted "amount of OOP" you already did ("abstracting some datastructures into classes") is probably what makes sense most. You can introduce some more classes to group some modules together, but if there is just one "main module" which requires five lists, stacks and sets, make those member variables of the corresponding class and consider them as global in the scope of that module, that's fine.

OOP is not an end in itself, it is a means to an end. It is for building larger software, isolating blocks from each other, keeping software maintainable and evolvable even when it grows over a longer period of time, and aims for a long-term scalable development process. Your kind of program does not seem to fit into that picture. If there is only one main algorithm inside, there is most probably just one "main module" required, and the program is simply too small to use more. So use from OOP what's useful for your case, and forget about the rest.

Said that, even in a relatively small optimization program as you scetched it, there will usually be three layers at minimum:

a data layer (you will most probably need a specialized graph structure, maybe with some general, basic operations for graphs)
a "business logic" layer (your main algorithm)
a user interface layer (either a command line interface, or maybe a simple GUI)

I would recommend to keep these separated, and use OO means like classes and objects to separate them. This will usually make it easier to add automated tests to this system, which might be considered as a fourth layer, and a fourth opportunity to use classes and objects.

With growing program size, I can envision that you find more opportunities for using classes and objects:

different algorithms: give each one a class of its own
exchangeable algorithms with the same input and output: define a common interface, derive each algorithm class from it
shared code and infrastructure for the different algorithms: refactor those to the data layer, if possible, or into helper classes
algorithms which are similar to a larger extent, except one inner part, which behaves differently: make use of the strategy pattern to encapsulate and exchange that inner part

But if you are currently not at that point, don't try to force more "OO" into your program than necessary.

Regarding the "name question": I don't think there is a special one, "small program with restricted life time" is the only name which comes into my mind.

I subscribe to John Wu's answer. Let me focus on this part which seems to capture your misunderstanding well:

I constantly have to use the same ~5 lists, stacks and sets for the algorithm, so it doesn't make sense to pass them as an argument to every function redundantly.

You imply that there is some cost in passing the same piece if information to multiple functions rather then letting them all reach out for it when they need it. This is typically not true.

While technically there is a performance benefit to using global variables, it is, in nearly 100% of the cases, meaningless. You are passing a reference. That is a couple of extra bytes pushed on the stack when the call is made and then you do your heavy processing. We can forget about this aspect.

The benefit of passing all input data to a function is that it makes it easier to get your head around the logic. Anything the function uses is clearly declared as input and there are no secrets. A secret would be the function using some external data source in line 6386 that is not obviously a part of the logic. "This is unsuspected behavior, how can it be? I am passing the same arguments and yet I get a different outcome. Ahhh! It is driving me nuts!"

(spelling out the code, two days later)

"O, look here... It is using this global data. Who made this?! What more is it accessing I do not know about? O, wait... I made this, about a year ago."

If those ~5 collections represent common, coherent input data you can package them in a new class, create an instance and pass that as a single argument to you functions. Just always try to make sure the signature of a method tells the whole story: what it does, what it needs and what it may change.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange