Splitting up a very large function/program into smaller functions. Effective?

https://softwareengineering.stackexchange.com/questions/345344

08-01-2021
|

سؤال

I know function names can be very expressive. And therefore it can be tempting split up a program into particular functions and call them from a large "oversee-able" functions file.

However is this actually effective for a programmer? Since each function usually relies on some input provided from a previous function, either a returned value or a global variable or whatever else your language uses - it often only works, and makes sense in context with the previous function. A change to a previous function would often obviously destroy functionality in the second function.

When one program is split up into functions which are jumbled up into a functions file. The effect that changes have are not necessarily very clear. So is it a good idea to split things up in this way?

المحلول

Since each function usually relies on some input provided from a previous function, either a returned value or a global variable or whatever else your language uses - it often only works, and makes sense in context with the previous function.

While I always favor compact, self-contained functions over huge, sprawling if/else trees, it is true that real-world programs are seldom as nice and neat (untangled) as we would wish. Application state (global or not) is generally unavoidable.

I believe that it is also true that there are diminishing returns to splitting things up into smaller and smaller functions. For a logical extreme, take a look at the book Clean Code by Robert C. Martin. Not everyone will agree with me, but I find many of his examples to actually be slightly less comprehensible after his refactoring. (Maybe I don't write enough Java to appreciate them?)

Having said that, I do personally believe that the function is the most powerful abstraction available to us. Sure, functions can be used incorrectly. There is absolutely nothing stopping you from writing terrible, tangled code with a bunch of tiny functions, even without those evil weapons, the global variables. But on a whole, functions are a force of goodness and light.

When practically possible, heed the standard wisdom of the crowd in creating "clean" functions:

Do not modify any external state within your functions.
A function should be deterministic: the same input must always produce the same output.

Just these two rules will avoid 90% of the most common design headaches. It also allows you to easily write tests for your functions!

Beyond that, just try your hardest to make only "clean" functions when you can. Depending on the application, this can actually be a very difficult exercise at first. But one does get better at it. As with most things, it's a real craft and it takes experience (programming experience in general and experience with that project, in particular) to get it just right.

Untangled code doesn't just happen. It's hard work and an art...and totally worth it.

نصائح أخرى

As someone who recently did just that, refactored a few long sheets of code into a set of small functions, I can attest it is effective.

I assume you heard about the Unix principle: "do one thing and do it well". It helps immensely.

A short function is easy to observe, easy to reason about, and easy to test. What's important, it's usually easy to test in absence of most other functions, and while mocking only a few, if any, objects it depends on.

When you write a short function, you are forced to think what exactly it does, come up with a proper name (doIt() does not fit), good parameter names, etc. It makes your code more self-descriptive, and it makes you understand the code better. It also gives you higher-level concepts to describe your program with.

Splitting complex state transitions into short functions makes you think about the data flow of the program, and untangle it as much as possible. This leads to fewer data dependencies, often shorter lifetimes of some data (important when the data is huge), and fewer errors like data races, state updates clobbering each other, etc.

Factoring out any fragment of code larger than a screenful into a function also makes you notice and factor out copy-pasted code, even slightly modified. Finding the common pattern helps understanding.

Sometimes you end up with functions you instantly recognize, because they do (almost) the same thing as some library function already does. You slash the line count, and reuse someone else's existing work (and any upcoming improvements and bugfixes).

Conversely, the functions you extracted can be reused in your other programs. You cannot reuse a fragment of monolithic code, except by copy-pasting it.

Many people struggle to write short functions, though. They tend to write a long monolithic script that does the whole thing, as they would write a whole chapter of prose. One of the remedies is not to just write code, but to immediately run it.

If you have a dynamic language like Python or JS, or even things like Scala, have a REPL open at all times. When you come up with a fragment of code, try it in the REPL. You will need dependent objects for it? Write a function that provides each dependent object. Once you came up with a code fragment that produces something meaningful, wrap it in a function, too. You will need it at the next step. This way you end up with a set of functions that compose into the whole program you needed, and you have run most of it already!

With languages that don't readily provide a REPL things are harder. Here 'test-driven design' may help: you cannot freely play with your code fragments in a REPL, but you can run it cheaply as tests. As you go, you end up with the whole program and with a set of tests verifying and explaining it. With a monolith, this is impossible.

If your program is a throwaway script, not important enough even to be peer-reviewed before running in production, a wall of code may be fine.

Otherwise, factoring your code into smaller pieces pays up every time you have to extend, modify, review someone else's changes to it, or just consult it.

Yes, this is generally a good idea. Smaller functions are usually easier to read and understand, and are potentially reusable in different contexts. It is true that the process of refactoring can sometimes break some things, but if you do it properly (and fix things as they break and using unit tests to help detect the things which break) then the end result is usually well worth it.

Since each function usually relies on some input provided from a previous function, either a returned value or a global variable or whatever else your language uses - it often only works, and makes sense in context with the previous function.

^ This statement is true even if you leave your code as one big function! Instead of small functions depending on each other, you have lines of code within a big function depending on each other. This is much worse because there is no way to indicate which variables apply to what code in what stage of the execution; everything in local scope is available to everything in the function.

If you were to break it down into a series of smaller functions, you can define constraints on which functions work with what dependencies, by declaring them as parameters and taking advantage of each function's local scope. Thus you can break a larger problem into a series of smaller problems that (1) have clear breaks between them, making each one easier to understand, and (2) have a clear relationship, defined in code, that tells you how the problem is broken down into subproblems.

This is far, far preferable to long functions that cover many pages that require the programmer to scroll up and down to figure out how it works. After years of programming, I can tell you from experience that those are the functions that tend to be full of the nastiest bugs.

it often only works, and makes sense in context with the previous function. A change to a previous function would often obviously destroy functionality in the second function.

If that is the case, your functions are badly designed. They do not have clear responsibilities or contracts and don't represent good abstractions.

Well-designed functions do one thing that is conceptually easy to understand from the function name without knowing the implementation details, and is mostly isolated from the rest of the application.

You'll find that if you have such functions that are true abstractions of the functionality they implement, they will much more often be reusable in different contexts, and at the same time less likely to have undesired side effects when they are changed.

With a badly designed function, any change you make to it has you worried about breaking things in 5 other places it's called from. With a well designed function, any change you make to it will be something (such as a bugfix) that you want to apply to the 5 other places it's called from.

However is this actually effective for a programmer?

Functional programming languages do everything with functions. Even a language with little support for functional programming should be able to get pretty darn close. So, yes.

A change to a previous function would often obviously destroy functionality in the second function.

Then don't do that. Leave the old function there, and make a new one, possibly using the old function to do most of the work and just tweaking the output.

The general philosophy for functional programming languages is like building up a vocabulary of jargon words and then writing something with them. And how do you get the jargon words? Well, you write definitions in terms of other words you already have. Instead of thinking of it as a giant piece of existing code that needs to be broken up in random places and shoved into functions, think of it as creating a vocabulary of functions that makes writing the top level function easier.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى softwareengineering.stackexchange