Return considered harmful? Can code be functional without it?

https://softwareengineering.stackexchange.com/questions/365829

29-01-2021
|

Question

OK, so the title is a little clickbaity but seriously I've been on a tell, don't ask kick for a while. I like how it encourages methods to be used as messages in true object-oriented fashion. But this has a nagging problem that has been rattling about in my head.

I have come to suspect that well-written code can follow OO principles and functional principles at the same time. I'm trying to reconcile these ideas and the big sticking point that I've landed on is return.

A pure function has two qualities:

Calling it repeatedly with the same inputs always gives the same result. This implies that it is immutable. Its state is set only once.
It produces no side effects. The only change caused by calling it is producing the result.

So, how does one go about being purely functional if you've sworn off using return as your way of communicating results?

The tell, don't ask idea works by using what some would consider a side effect. When I deal with an object I don't ask it about its internal state. I tell it what I need to be done and it uses its internal state to figure out what to do with what I've told it to do. Once I tell it, I don't ask what it did. I just expect it to have done something about what it was told to do.

I think of Tell, Don't Ask as more than just a different name for encapsulation. When I use return I have no idea what called me. I can't speak it's protocol, I have to force it to deal with my protocol. Which in many cases gets expressed as the internal state. Even if what is exposed isn't exactly state it's usually just some calculation performed on state and input args. Having an interface to respond through affords the chance to massage the results into something more meaningful than internal state or calculations. That is message passing. See this example.

Way back in the day, when disk drives actually had disks in them and a thumb drive was what you did in the car when the wheel was too cold to touch with your fingers, I was taught how annoying people consider functions that have out parameters. void swap(int *first, int *second) seemed so handy but we were encouraged to write functions that returned the results. So I took this to heart on faith and started following it.

But now I see people building architectures where objects let how they were constructed control where they send their results. Here's an example implementation. Injecting the output port object seems a bit like the out parameter idea all over again. But that's how tell-don't-ask objects tell other objects what they've done.

When I first learned about side effects I thought of it like the output parameter. We were being told not to surprise people by having some of the work happen in a surprising way, that is, by not following the return result convention. Now sure, I know there's a pile of parallel asynchronous threading issues that side effects muck about with but return is really just a convention that has you leave the result pushed on the stack so whatever called you can pop it off later. That's all it really is.

What I'm really trying to ask:

Is return the only way to avoid all that side effect misery and get thread safety without locks, etc. Or can I follow tell, don't ask in a purely functional way?

Solution

If a function doesn't have any side effects and it doesn't return anything, then the function is useless. It is as simple as that.

But I guess you can use some cheats if you want to follow the letter of the rules and ignore the underlying reasoning. For example using an out parameter is strictly speaking not using a return. But it still does precisely the same as a return, just in a more convoluted way. So if you believe return is bad for a reason, then using an out parameter is clearly bad for the same underlying reasons.

You can use more convoluted cheats. E.g. Haskell is famous for the IO monad trick where you can have side effects in practice, but still not strictly speaking have side effects from a theoretical viewpoint. Continuation-passing style is another trick, which well let you avoid returns at the price of turning your code into spaghetti.

The bottom line is, absent silly tricks, the two principles of side-effect free functions and "no returns" are simply not compatible. Furthermore I will point out both of them are really bad principles (dogmas really) in the first place, but that is a different discussion.

Rules like "tell, don't ask" or "no side effects" cannot be applied universally. You always have to consider the context. A program with no side effects is literally useless. Even pure functional languages acknowledge that. Rather they strive to separate the pure parts of the code from the ones with side-effects. The point of the State or IO monads in Haskell is not that you avoid side effects - because you can't - but that the presence of side effects is explicitly indicated by the function signature.

The tell-dont-ask rule applies to a different kind of architecture - the style where objects in the program are independent "actors" communicating with each other. Each actor is basically autonomous and encapsulated. You can send it a message and it decides how to react to it, but you cannot examine the internal state of the actor from the outside. This means you cannot tell if a message changes the internal state of the actor/object. State and side effects are hidden by design.

OTHER TIPS

Tell, Don't Ask comes with some fundamental assumptions:

You're using objects.
Your objects have state.
The state of your objects affects their behavior.

None of these things apply to pure functions.

So let's review why we have the rule "Tell, Don't Ask." This rule is a warning and a reminder. It can be summarized like this:

Allow your class to manage its own state. Don't ask it for its state, and then take action based on that state. Tell the class what you want, and let it decide what to do based on its own state.

To put it another way, classes are solely responsible for maintaining their own state and acting on it. This is what encapsulation is all about.

From Fowler:

Tell-Don't-Ask is a principle that helps people remember that object-orientation is about bundling data with the functions that operate on that data. It reminds us that rather than asking an object for data and acting on that data, we should instead tell an object what to do. This encourages us to move behavior into an object to go with the data.

To reiterate, none of this has anything to do with pure functions, or even impure ones unless you're exposing a class's state to the outside world. Examples:

TDA Violation

var color = trafficLight.Color;
var elapsed = trafficLight.Elapsed;
If (color == Color.Red && elapsed > 2.Minutes)
    trafficLight.ChangeColor(green);

Not a TDA Violation

var result = trafficLight.ChangeColor(Color.Green);

var result = await trafficLight.ChangeColorWhenReady(Color.Green);

In both of the latter examples, the traffic light retains control of its state and its actions.

When I deal with an object I don't ask it about its internal state. I tell it what I need to be done and it uses its internal state to figure out what to do with what I've told it to do.

You don't only ask for its internal state, you don't ask if it has an internal state at all either.

Also tell, don't ask! does not imply not getting a result in form of a return value (provided by a return statement inside the method). It just implies I don't care how you do it, but do that processing!. And sometimes you immediately want the processings result...

If you consider return as "harmful" (to stay in your picture), then instead of making a function like

ResultType f(InputType inputValue)
{
     // ...
     return result;
}

build it in a message-passing manner:

void f(InputType inputValue, Action<ResultType> g)
{
     // ...
     g(result);
}

As long as f and g are side-effect free, chaining them together will be side-effect free as well. I think this style is similar to what is also called Continuation-passing style.

If this really leads to "better" programs is debatable, since it breaks some conventions. The german software engineer Ralf Westphal made a whole programming model around this, he called it "Event Based Components" with a modeling technique he calls "Flow Design".

To see some examples, start in the "Translating to Events" section of this blog entry. For the full approach, I recommend his e-book "Messaging as a Programming model - Doing OOP as if you meant it".

Message passing is inherently effectful. If you tell an object to do something, you expect it to have an effect on something. If the message handler was pure, you would not need to send it a message.

In distributed actor systems, the result of an operation is usually sent as a message back to the sender of the original request. The sender of the message is either implicitly made available by the actor runtime, or it is (by convention) explicitly passed as a part of the message. In synchronous message passing, a single response is akin to a return statement. In asynchronous message passing, using response messages is particularly useful as it allows for concurrent processing in multiple actors while still delivering results.

Passing the "sender" to which the result should be delivered explicitly basically models continuation passing style or the dreaded out parameters - except that it passes messages to them instead of mutating them directly.

This entire question strikes me as a 'level violation'.

You have (at least) the following levels in a major project:

The system level e.g. e-commerce platform
The sub-system level e.g. user validation: server, AD, front-end
The individual program level e.g. one of the components in the above
The Actor/Module level [this gets murky depending on language]
The method/function level.

And so on down to individual tokens.

There isn't really any need for an entity at the method/function level not to return (even if it just returns this). And there isn't (in your description) any need for an entity at the Actor level to return anything (depending on language that may not even be possible). I think the confusion is in conflating those two levels, and I would argue that they should be reasoned about distinctly (even if any given object actually spans multiple levels).

You mention that you want to conform to both the OOP principle of "tell, don't ask" and the functional principle of pure functions, but I don't quite see how that led you to eschew the return statement.

A relatively common alternative way of following both these principles is to go all-in on the return statements and use immutable objects with getters only. The approach then is that to have some of the getters return a similar object with a new state, as opposed to changing the state of the original object.

One example of this approach is in the Python builtin tuple and frozenset data types. Here's a typical usage of a frozenset:

small_digits = frozenset([0, 1, 2, 3, 4])
big_digits = frozenset([5, 6, 7, 8, 9])
all_digits = small_digits.union(big_digits)

print("small:", small_digits)
print("big:", big_digits)
print("all:", all_digits)

Which will print the following, demonstrating that the union method creates a new frozenset with its own state without affecting the old objects:

small: frozenset({0, 1, 2, 3, 4})

big: frozenset({5, 6, 7, 8, 9})

all: frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9})

Another extensive example of similar immutable data structures is Facebook's Immutable.js library. In both cases you start with these building blocks and can build higher-level domain objects that follow the same principles, achieving a functional OOP approach, which helps you encapsulate the data and reason about it more easily. And the immutability also lets you reap the benefit of being able to share such objects between threads without having to worry about locks.

I have come to suspect that well-written code can follow OO principles and functional principles at the same time. I'm trying to reconcile these ideas and the big sticking point that I've landed on is return.

I've been trying my best to reconcile some of the benefits of, more specifically, imperative and functional programming (naturally not getting all the benefits whatsoever, but trying to get the lion's share of both), though return is actually fundamental to doing that in a straightforward fashion for me in many cases.

With respect to trying to avoid return statements outright, I tried to mull over this for the past hour or so and basically stack overflowed my brain a number of times. I can see the appeal of it in terms of enforcing the strongest level of encapsulation and information hiding in favor of very autonomous objects that are merely told what to do, and I do like exploring the extremities of ideas if only to try to get a better understanding of how they work.

If we use the traffic light example, then immediately a naive attempt would want to give such traffic light knowledge of the entire world that surrounds it, and that would certainly be undesirable from a coupling perspective. So if I understand correctly you abstract that away and decouple in favor of generalizing the concept of I/O ports which further propagate messages and requests, not data, through the pipeline, and basically inject these objects with the desired interactions/requests among each other while oblivious to each other.

The Nodal Pipeline

And that diagram is about as far as I got trying to sketch this out (and while simple, I had to keep changing it and rethinking it). Immediately I tend to think a design with this level of decoupling and abstraction would find its way becoming very difficult to reason about in code form, because the orchestrator(s) who wire all these things up for a complex world might find it very difficult to keep track of all these interactions and requests in order to create the desired pipeline. In visual form, however, it might be reasonably straightforward to just draw these things out as a graph and link everything up and see things happening interactively.

In terms of side effects, I could see this being free of "side effects" in the sense that these requests could, on the call stack, lead to a chain of commands for each thread to perform, e.g. (I don't count this as a "side effect" in a pragmatic sense as it is not altering any state relevant to the outside world until such commands are actually executed -- the practical goal to me in most software is not to eliminate side effects but defer and centralize them). And furthermore the command execution might output a new world as opposed to mutating the existing one. My brain is really taxed just trying to comprehend all this however, absent any attempt at prototyping these ideas. I also didn't try to tackle how to pass parameters along with the requests in favor of just trying a timid approach at first of thinking of all of these requests as nullary functions with a uniform signature/interface.

How it Works

So to clarify I was imagining how you actually program this. The way I was seeing it working was actually the diagram above capturing the user-end (programmer's) workflow. You can drag a traffic light into the world, drag a timer, give it an elapsed period (upon "constructing" it). The timer has an On Interval event (output port), you can connect that to the traffic light so that on such events, it's telling the light to cycle through its colors.

The traffic light might then, on switching to certain colors, emit outputs (events) like, On Red, at which point we might drag a pedestrian into our world and make that event tell the pedestrian to start walking... or we might drag birds into our scene and make it so when the light turns red, we tell birds to start flying and flapping their wings... or maybe when the light turns red, we tell a bomb to explode -- whatever we want, and with the objects being completely oblivious to each other, and doing nothing but indirectly telling each other what to do through this abstract input/output concept.

And they fully encapsulate their state and reveal nothing about it (unless these "events" are considered TMI, at which point I'd have to rethink things a lot), they tell each other things to do indirectly, they don't ask. And they're uber decoupled. Nothing knows about anything except this generalized input/output port abstraction.

Practical Use Cases?

I could see this type of thing being useful as a high-level domain-specific embedded language in certain domains to orchestrate all these autonomous objects which know nothing about the surrounding world, expose nothing of their internal state post construction, and basically just propagate requests among each other which we can change and tweak to our hearts' content. At the moment I feel like this is very domain-specific, or maybe I just haven't put enough thought into it, because it's very difficult for me to wrap my brain around with the types of things I regularly develop (I often work with rather low-mid-level code) if I were to interpret Tell, Don't Ask to such extremities and want the strongest level of encapsulation imaginable. But if we're working with high-level abstractions in a specific domain, this might be a very useful way to program it and express how things interact with each other in a rather uniform fashion that doesn't get muddled up in the state, or computations/outputs, of its objects, with uber decoupling of a kind where even the analogical caller need not not know much, if anything, about its callee, or vice versa.

Signals and Slots

This design looked oddly familiar to me until I realized it's basically signals and slots if we don't take a lot the nuances of how it's implemented into account. The main question to me is how effectively we can program these individual nodes (objects) in the graph as strictly adhering to Tell, Don't Ask, taken to the degree of avoiding return statements, and whether we can evaluate said graph without mutations (in parallel, e.g., absent locking). That's where the magical benefits are is not in how we wire these things together potentially, but how they can be implemented to this degree of encapsulation absent mutations. Both of these seem feasible to me, but I'm not sure how widely applicable it would be, and that's where I'm a bit stumped trying to work through potential use cases.

I clearly see leak of certainty here. It seems that "side-effect" is well-known and commonly-understood term, but in reality it's not. Depending upon your definitions (which are actually missing in the OP), side-effects might be totally necessary (as @JacquesB managed to explain), or mercylessly unaccepted. Or, making one step towards clarification, there is necessity to distinguish between desired side-effects one doesn't like to hide (at this points famous Haskell's IO emerges: it's nothing but a way to be explicit) and undesired side-effects as a result of code bugs and such kind of things. Those are pretty different problems and thus require different reasoning.

So, I suggest to start from rephrasing yourself: "How do we define side-effect and what does given definition(s) say about it's interrelation with "return" statement?".

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange