Should I use AND in a function name?

https://softwareengineering.stackexchange.com/questions/404195

06-03-2021
|

Question

I am in a predicament. I am working in a repository that makes a lot of database changes within its functions.

The function I am dealing with returns responseids (but from a transformation, not from any database action). However, as a side effect, it adds an object containing those responseIds to the database.

So should I name it:

getResponseIds: This highlights the return values. It is a very functional way of thinking, but obviously if I have function postToDB it makes no sense to use getStatusOfPost
addResponseIdToDB: This highlights the side effect, although I truly think that many of my functions just operate on the database (and tend not to return anything)
getAndAddResponseIdsToDB: Very informative, but very long.

What are the pros and cons on the suggestions above? Or can you make a better suggestion yourself?

Solution

A function name that contains and is at the wrong level of abstraction.

I lean towards addResponseIdToDB() because otherwise the ‘side effect’ is a complete surprise. However:

responseIds = addResponseIdToDB();

doesn’t leave anyone surprised.

The command query responsibility segregation principle argues that this should not be the only way to get the responseId object. There should also be a query that does not change the DB that can get this object.

Contrary to Bertrand Meyer, I don't believe this means the add has to return void. Just means an equivalent pure query should exist and be easy to find so the DB doesn't get needlessly abused by use of state changing queries.

_{Given that getResponseIds() should exist and not talk to the database, the best name for a method that does both is actually addToDB(getResponseId()). But that's just if you want to get all functional composition about it.}

OTHER TIPS

As others have mentioned, using and in a function name, automatically implies that a function is doing at least two things, which usually indicates that it's doing too much or is doing something at the wrong place.

Using the and clause in a function name however, in my experience, can make sense when there are some steps in a certain flow that need to be performed in sequence or when the whole computation becomes meaningful.

In numerical computing this can make a lot of sense:

normalizeAndInvert(Matrix m)

I mean, who knows, right? If you need to compute some... say, lighting trajectories in computer graphics and at a certain stage you need to normalize and invert a certain matrix, and you find yourself constantly writing:

m = normalize(m), followed by invert(m), just as a simple example, introducing the abstraction of normalizing and inverting, can be good from a readability perspective.

Speaking semantically, stuff like divideAndConquer() etc, is probably discouraged to be explicitly written out, but, well, the And there is basically necessary.

I would say it's fine in general, but isn't acceptable in all cases.

For example, one could argue against and in a function name based on the single responsibility principle aka the S in SOLID - because the and could imply multiple responsibilities. If the and really does mean the function is doing two things, then you probably want to think carefully about what's happening.

An example of a library that utilizes and in function names can be found in Java's concurrency and here and is actually a very important part of what's going on, and closely matches what you describe in your post where the state is changed and state is returned. So clearly there are some people (and some use cases) where it's deemed acceptable.

Indeed, that's a difficult question to answer as numerous comments can attest. People seem to have contradictory opinions and advice on what is a good name.

I'd like to add my 2 cents to this 2-month old thread because I think I can add more colors to what was already suggested.

Naming is a process

All of this reminds me of the excellent 6 steps guide: Naming as a process.

A poor function name is one that's surprising and you realize you can't trust. Good naming is hard to get right at first shot. It becomes easier with experience.

6 iterative steps to build a good name

Replace surprising name with obvious nonsense like appleSauce(). Sounds silly, but it's temporary and it make it obvious that the name can't be trusted.
Get to an honest name, based on what you understand from what the function does. Say you didn't understood yet the insertion in DB part, getResponseIdsAndProbablyUseDb would be an option.
Get to completely honest, so the function name says everything the function does (getAndAddResponseIdsToDB or getResponsesButAlsoAddCacheOfTheResponsesToTheDb from @Fattie are great examples)
Get to "does the right thing" which is basically the part where you split your function along the "AND". So you actually have getResponseIds and addResponseIdsToDb.
Get to an "Intention Revealing" name which solves the point of "I always want to get the response ID I inserted in DB after I do so". Stop thinking about the implementation detail and build an abstraction that uses the 2 smallest functions to build something else. This is the higher-abstraction level mentioned by @candied_orange. For example createResponseIds could do it and will be the composition of getResponseIds and addResponseIdsToDb.
Get to your Domain abstraction. This is hard, it depends on your business. It takes time listening to your business language to get right. Eventually, you may end up with the concept of a Response and Response.createIds() would make sense. Or maybe ResponseId is something valuable and you would have a factory to create many. Insertion in DB would be an implementation detail of a Repository, etc. Yes, the design would be more abstract. It would be richer and let you express more. No-one outside of your team can tell you what the correct Domain abstraction should be in your situation. It depends.

In your case, you've already figured out 1 and 2, so you should probably at least go to 3 (use "AND") or further. But going further is not just a question of naming, you'd actually split the responsibilities.

Thus, the different suggestions here are valid

Essentially:

Yes, surprising function names are terrible and no-one wants to deal with that
Yes, "AND" is acceptable in a function name because it's better than a misleading name
Yes, the level of abstraction is a related concept and it matters

You don't have to go for stage 6 right away, it's fine to stay at stage 2, 3 or 4 until you know better!

I hope that would be helpful in giving a different perspective on what looks like contradictory advise. I do believe that everyone aims for the same goal (non-surprising names) but stop at different stages, based on their own experience.

Happy to answer if you have any questions :)

Your function does two things:

Gets a set of IDs via a transform and returns them to the caller,
Writes that set of IDs to a database.

Remember the single responsibility principle here: you should aim for a function to have one responsibility, not two. You have two responsibilities, so have two functions:

getResponseIds - Gets a set of IDs via a transform and returns them to the caller
addResponseIdToDB - Takes a set of IDs and writes them to a database.

And the whole issue of what to call a single function with multiple responsibilities goes away and the temptation to put and in the function names goes away too.

As an added bonus, because the same function is no longer responsible for two unrelated actions, getResponseIds could be moved out of your repo code (where it doesn't belong as it's not doing any DB-related activity), into a separate business-level class/module etc.

Having a composite function name is acceptable, so which naming scheme you use depends on your context. There are purists who will tell you to break it up, but I will argue otherwise.

There are two reasons to try to avoid such things:

Purity - A function which does two things should be converted to two functions which do one thing each.
Lexical size - a bigUglyCompositeFunctionName gets hard to read.

The purity argument is typically the one that is focused on. As a vague general heuristic, breaking up functions is a good thing. It helps compartmentalize what is going on, making it easier to understand. You are less likely to do "clever" tricks with sharing of variables which lead to coupling between the two behaviors.

So the thing to ask yourself is "should I do it anyways?" I say breaking things up is a vague heuristic, so you really only need a decent argument to go against it.

One major reason is that its beneficial to think of the two operations as one. The end-all-ultimate example of this is found in atomic operations: compare_and_set. compare_and_set is a fundamental cornerstone of atomics (which are a way of doing multithreading without locks) which is successful enough that many are willing to think of it as the cornerstone. There's an "and" in that name. There's a very good reason for this. The entire point of this operation is that it does the comparison and the setting as one indivisible operation. If you broke it up into compare() and set(), you'd defeat the entire reason the function existed in the first place, because two compares() could occur back to back before the set()s.

We also see this in performance. My favorite example is fastInvSqrt. This is a famous algorithm from Quake which calculates 1/sqrt(x). We combine the "inverse" and "square root" operations because doing so gives us a massive performance boost. They do a Newton's approximation which does both operations in a single step (and happens to do it in integer math rather than floating point, which mattered in the era). If you were to do inverse(sqrt(x)), it would be much slower!

There are some times where the result is more clear. I've written several APIs involving threading where one has to be dreadfully careful about things like deadlocks. I didn't want my user to be aware of the internal implementation details (especially since I might change them), so I wrote the API with a few "and" functions which prevented the user from ever having to hold a lock for more than one function call. This means they never needed to know how I was handling the multithreaded synchronization under the hood. In fact, most of my users weren't even aware they were in a multithreaded environment!

So while the general rule is to break things up, there can always be a reason to couple them together. For example, can your user break your architecture by adding to a database multiple times without the "gets" inbetween? If each of the responses logged in the DB needs to have a unique identifier, you could get in trouble if the user got to do them willy nilly. Likewise, would you ever want to let a user "get" without logging the results in the DB? If the answer is "yes" then break them up so that the user can access the "get" function. However, if the security of your application is broken if the user can "get" without the result being logged in the DB, you really should keep the functions together.

Now as for the lexical issues, your function name should describe what the user needs to know about it. Let's start with the "get" part. Its pretty easy to remove the "get" from a function name, because all of your examples will show an assignment: int id = addResponseIdToDB()". In all places where the function is used, you will end up documenting the fact that the function returned a value.

Likewise "add" can be optional. We use the term "side effect" as an all encompassing term, but there's different shades of it. If the DB entries are merely a log, there may be no reason to highlight it. We never see functions like playMusicAndSendPersonalInformationToOurCorporateServers. It's just playMusic, and if you're lucky, the documentation may mention something about a socket connection over the internet. On the other hand, if users are expected to call this function for the purpose of adding things to the DB, then "add" is essential. Don't take it out.

Now, I've written a lot of reasons to do every possible combination of what you are asking. I've intentionally not answered the question because there's a choice to be made. It's not a rule to follow. You're crafting an API.

That being said, my instinct is that addResponseIdToDB is likely to be the best answer. In most cases the "get" portion is an obvious enough side effect that it doesn't earn the extra noise caused by typing it everywhere. However, there's a few places where I might consider it important:

If the "get" is expensive -- If you have to do a "get" that involves fetching something over the internet and then adding it to a local cache DB, by all means the "get" is important. It is the thing the user is trying to do.
If you need to make it clear that the user needs to pay attention to the value. If the user wants to use the variable, they'll realize the API returns it, and they'll use it. However, you want to watch out for the user that doesn't know they need it. For example, if you have to "close" this operation by ID later to free up memory, you may want to draw attention to the fact that you're doing something. In these cases, you might want to look at a verb other than "get." "get" often implies idempotent functions (calling it again doesn't do anything). If it returns resources, other verbs like "create" or "acquire" are nice. This particular failure mechanism is a major issue in C when doing exception handling. C lacks the catch/throw mechanism of C++, so it relies on return codes. Developers are famous for simply failing to check these return codes and getting into really bad situations like buffer overflows because of it.
Symmetry -- Sometimes you design an API to have a lexical symmetry to it. You set up the words so that they come in pairs, or other patterns, and craft the commands so that it is easy to visually identify whether the pattern has been followed or not. I'd say this is rare, but its not unheard of. There's a reason closing XML tags repeat the tag name (such as <foo> and </foo>).

What is the ultimate intent of this method? Why add these transformed ids to the DB, is it for the purpose of caching? I'll work under that assumption.

It sounds like the intent of the method is really just to get the transformed response ids (either doing the transform or getting them from a cache), and so there's your method name:

getTransformedResponseIds or less verbosely getResponseIds()

A method with this name could cache or maybe not, and that can be documented, but your method name shouldn't tie you to a specific implementation; What if you decide to stop using a DB and instead cache them elsewhere such as just temporarily in memory?

Side effects should be just that, side effects, they should probably be documented but they shouldn't really impact the core intent (or success) or name of the function. If getting the value from the cache fails (or if you configure to skip caching) that shouldn't be a critical issue, the method should just transparently re-calculate, cache (or not), and return the new value.

Sometimes using an "AND" is helpful to convey intent such as with the Java methods getAndIncrement and incrementAndGet so that's certainly not off the table.

You say that the function returns something "from a transformation, not from any db action".

This suggests that the function behaves more like a constructor than a getter. However, constructors also shouldn't cause side-effects (thanks for the link, @MechMK1).

Factory methods, on the other hand, already presuppose some level of messiness. For example, one of the motivations for using factory methods is that a factory method:

Allows for more readable code in cases where multiple constructors exist, each for a different reason. - wikipedia

Likewise,

If you need to do some work with side effects, a factory method can be named in such a way that it's more clear what those side effects are. - ruakh

Consider making the function a factory method, using the term "logged" to alert programmers to the database storage:

Declaration: createLoggedResponseid(...) {...} // log object to db
Call: responseid = createLoggedResponseid(...);

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange