Clean Code: A function with many arguments should become an object? (Martin's Clean Code)

https://softwareengineering.stackexchange.com/questions/415773

clean-code

15-03-2021
|

Question

I came across reading a blogpost (this one on medium.com) about some principles of writing clean code and referring to a book by Robert C. Martin. Let me cite a particular paragraph from the blogpost:

A function shouldn’t have more than 3 arguments. Keep it as low as possible. When a function seems to need more than two or three arguments, it is likely that some of those arguments ought to be wrapped into a class of their own. Reducing the number of arguments by creating objects out of them may seem like cheating, but it’s not.

I definitely see that there is a point to it and indeed, I often am unsatisfied with the number of arguments functions in my code have. I usually try to set as many arguments as possible to meaningful default values but still, this really doesn't look like clean code.
However, I also don't feel perfectly comfortable with the suggestion quoted above. In particular because

As far as understand things, classes are supposed to represent meaningful objects instead of replacing functions.
Many functions in standard libraries have multiple arguments. For example, in the Python library Pandas, the function plot for a DataFrame object has 37 arguments plus accepts a variety of further keyword arguments passed to a lower level function (the Matplotlib plot function to be precise). So it seems that the above mentioned principle was not applied by the Pandas developers. Indeed, having an object Plot() seems to be one of the cases I mentioned in the first bullet where you have some class that doesn't seem to represent a meaningful object.

I know this is kind of meta but what do you think about it? What is best practice here? Is there some general advice you can give?

(Just to make clear, coding is a minor part of my job and I would by no means call myself a professional programmer. Most of the coding I do is done in my spare time and I believe my coding abilities therefore are at most medium level but quite far away from high level.)

Solution

As far as understand things, classes are supposed to represent meaningful objects instead of replacing functions.

While a generally advisable guideline, that is not an absolute rule. While fully anemic designs are generally frowned upon, that doesn't mean there is a zero-tolerance for data-only classes (e.g. DTOs) or function bags (e.g. what is generally referred to as helper methods/classes).

When a function seems to need more than two or three arguments, it is likely that some of those arguments ought to be wrapped into a class of their own.

The suggestion here is to wrap the data in a DTO-like class. It doesn't specify (nor does it exclude) whether this class should have any methods/logic inside of it as well.

Many functions in standard libraries have multiple arguments. For example, in the Python library Pandas, the function plot for a DataFrame object has 37 arguments plus accepts a variety of further keyword arguments passed to a lower level function (the Matplotlib plot function to be precise).

Your argument is a false negation. Just because it's advisable to do X, doesn't mean that not doing X will always lead to a bad result. It's generally advisable to not cut your body, but that doesn't mean that surgeons shouldn't use scalpels.

Maybe the many arguments were warranted for your library. Maybe the library has flaws, one of which being the argument lists. Maybe the library author disagrees that many arguments are an issue.

None of these considerations contradict the validity of the advice your referenced blog post is giving.

I know this is kind of meta but what do you think about it? What is best practice here? Is there some general advice you can give?

I mostly agree with the advice. Keeping code simple, well, keeps things simple when you have to maintain that code later on. It sounds tautological but it's valid, and when incrementally building software it's so very easy to succumb to the "one more argument" trap that usually creates these many-argument-functions.

That being said, this is a guideline, not an absolute rule. There will be cases where arguments cannot be logically grouped together, or where the amount of arguments makes actual sense.

No meaningfully valuable suggestion can account for every possible use case at every point in time, and I would advise anyone to not try and hold any clean coding guideline to such a standard.

OTHER TIPS

While this recommendation revolves about arguments, it's not fundamentally about arguments.

The key point is this:

"it is likely that some of those arguments ought to be wrapped into a class of their own"

Why? Usually, when you have a large number of arguments, some of those arguments will be more closely related than others. E.g., they will be related to the same concept, and in the method itself, there will be some logic that manipulates those clusters of arguments to achieve something. The problem is, that concept is not made explicit, and the logic related to it is not clearly delineated. What you likely have in there is code that intermingles different concepts, or different levels of abstraction. Mixes responsibilities on a local level, if you will.

You've said:

"As far as understand things, classes are supposed to represent meaningful objects"

Yeah! A class is an explicit representation of some concept. And when you have a large number of parameters, it's worth checking if there's a meaningful/useful concept in there that's currently not represented, and if it should be.

So it's not just about bundling parameters together.

E.g., inside the method, there's probably going to be a code block that only uses a couple of parameters with a comment on top of it explaining what it does. Or maybe there will be a block comprised of one or more if-conditionals. And the details of that block are not going to be the main point of the method. You can probably do the Extract Method refactoring and get a cleaner, easier to understand code in the original method - code that more succinctly expresses what the method actually does. But then you should ask yourself, does the newly extracted method really belong to the same class? Maybe it does, but maybe it would be better to relocate it.

Look at this method; this is a contrived example, but think of something like a 2D game - this checks if two game entities have collided (passed through each other) and produces a result that describes the collision and how to resolve it (how to reposition the two entities). In reality, this would probably need to take in additional parameters, but let's keep it relatively simple.

// Takes two axis-aligned rectangles representing the bounding boxes 
// of two entities and returns a CollisionInfo instance.
CollisionInfo ResolveColision(
  double left1, double top1, double right1, double bottom1,
  double left2, double top2, double right2, double bottom2)
{
  // Prepare some variables to store stuff
  // (... omitted ...)

  // Compute the intersection
  // (... a wall of code ...)
  // (... a wall of code ...)
  // (... a wall of code ...)
  // (... a wall of code ...)
  // (... a wall of code ...)
  // (... a wall of code ...)
  // (... a wall of code ...)
  //           ...
  // (... a wall of code ...)
  // (... a wall of code ...)
  // (... a wall of code ...)
  // (... a wall of code ...)
  // (... a wall of code ...)
  // (... a wall of code ...)
  // (... a wall of code ...)

  // Figure out how to resolve the collision
  // (... omitted ...)

  // Create the output data structure
  var collisionInfo = // (... omitted ...)

  return collisionInfo;
}

If you look at this code, there are hints that there are some concepts in there that lack explicit representation. E.g., the description of the method mentions axis-aligned bounding rectangles. The parameter list is formatted in two rows in a way that looks deliberate, and the suffixes in the names of the parameters indicate that there are really two objects there, not eight. Then (if you're lucky) there's a comment explaining what that wall of code is doing.

Well, let's create a class to represent an axis-aligned bounding rectangle, let's give it an Intersect method, and let's put that wall of code in there.

This may require some cleanup, because the different parts of the code probably depend on each other in subtle ways (variables may be reused, one part of the code may make assumptions about the previous section, etc.)

But after the separation, I can be a bit more declarative - I just want to tell the code to compute the intersection; I don't care how:

// The method is now essentially self-documenting; 
// that documentation comment from before is now redundant
CollisionInfo ResolveColision(BoundingRect rect1, BoundingRect rect2)
{
  BoundingRect intersection = rect1.Intersect(rect2);
  
  // Use 'intersection' to figure out how to resolve the collision
  // (... omitted ...)
  
  return new CollisionInfo(/* ... omitted ... */);
}

// Sometimes, you may choose to keep the original method as an 
// overload, for convenience, or for backward compatibility. 
// But this just delegates to the previous method, so it's not a 
// problem in terms of readability, maintenance, etc.
CollisionInfo ResolveColision(
  double left1, double top1, double right1, double bottom1,
  double left2, double top2, double right2, double bottom2)
{
  return ResolveCollision(
    new BoundingRectangle(left1, top1, right1, bottom1)
    new BoundingRectangle(left2, top2, right2, bottom2));
}

Besides, now I can reuse the BoundingRect class, and its Intersect method, elsewhere, and I can change the implementation of Intersect without affecting the code that calls it.

"Many functions in standard libraries have multiple arguments."

The prevalence of something doesn't mean that it's an example of good design or that it should be emulated. Again, sometimes a large(r) number of arguments is the way to go, but first check if that's really true. Look for those implicit concepts, consider ease of use, etc.

P.S. In OOP, sometimes you'll have ordinary, dumb data structures with no behavior. These will also be represented by classes, but they'll have only getters and setters, with a few or no methods. These can appear in a number of places, notably, at the boundaries of the application - where you communicate with the UI, or the web, or the DB, etc. Or maybe in a part of the codebase that's written in a more functional style. So sometimes refactoring a large parameter list into a smaller number of objects will not be due to refining the OO design of your object model, but it will happen in order to confirm to external interfaces, or simply for convenience.

As far as understand things, classes are supposed to represent meaningful objects instead of replacing functions.

Mr. Martin is not suggesting that objects replace functions. He's suggesting that an object can replace groups of parameters.

drawRectangle(x1, y1, x2, y2, COLOR.black);

becomes:

drawRectangle(position1, position2, COLOR.black);

What did that really cost? We had to come up with a meaningful name. If you want to create readable code, spend time coming up with good names.

Now sure, you could do this:

new Rectangle(x1, y1, x2, y2, COLOR.black).draw();

but that doesn't fix the long parameter list. Instead do this:

new Rectangle(position1, position2, COLOR.black).draw();

Many functions in standard libraries have multiple arguments.

You don't have to access libraries directly. If you can design better ways to access them then you can create code that's easier to look at. Remember, we're all getting better at this. Don't let the past tie you down.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange