Should one prefer a generic version of a function, even if it's not re-used (yet)?

https://softwareengineering.stackexchange.com/questions/415538

15-03-2021
|

Question

YAGNI might tell us, that in the below implementation the generic version is not needed, as long as the function is only used once.

But to me personally, it seems, the generic version is more readable because I'm not distracted by all the possibilities the special class has, but are not used. The generic version is exposing less complexity to the algorithm. (What it actually does is not that important for this example.)

Specialized version:

enum class Thing {
    A, B, C, D, E
    // Many member functions here
}

enum class Category {
    Foo, Bar
    // Many member functions here
}

val thingCategories = mapOf(
    Thing.A to Category.Foo,
    Thing.B to Category.Bar,
    Thing.C to Category.Foo
)

fun countUniqueThingCategories(xs: Iterable<Thing>): Int {
    val (mapped, nonMapped) = xs.partition { it in thingCategories }
    return mapped.map { thingCategories[it] }.distinct().count() + nonMapped.distinct().count()
}

fun main() {
    val things = listOf(Thing.A, Thing.C, Thing.D, Thing.B, Thing.A, Thing.E)
    println(countUniqueThingCategories(things))
}

Generic version:

enum class Thing {
    A, B, C, D, E
    // Many member functions here
}

enum class Category {
    Foo, Bar
    // Many member functions here
}

val thingCategories = mapOf(
    Thing.A to Category.Foo,
    Thing.B to Category.Bar,
    Thing.C to Category.Foo
)

fun <T, U> countUniqueWithMapping(xs: Iterable<T>, mapping: Map<T, U>): Int {
    val (mapped, nonMapped) = xs.partition { it in mapping }
    return mapped.map { mapping[it] }.distinct().count() + nonMapped.distinct().count()
}

fun countUniqueThingCategories(xs: Iterable<Thing>) = countUniqueWithMapping(xs, thingCategories)

fun main() {
    val things = listOf(Thing.A, Thing.C, Thing.D, Thing.B, Thing.A, Thing.E)
    println(countUniqueWithMapping(things, thingCategories))
}

Which version would you prefer to find when maintaining a project?

Solution

There are definitely cases where solving a more general problem than required makes code easier to read, to reason about and to maintain. The most simple example I can think of is when code deals with input data consisting of four or five similar attributes, and the processing code gets duplicated up to 5 times because the responsible developer is too unexperienced or too lazy to refactor the existing code to the usage of an array and a loop. So by solving the more general problem of processing "N" pieces of data, though only 5 are required, is definitely a good idea.

But lets talks about "generics": a generic approach bears sometimes the potential to split a complex function into a less complex one together with some separated data type. If that helps to move some of the complexity of the data type out of the function, this can indeed improve readibility and maintainability, even if the function and the data type are not used elsewhere.

Said that, I fail to see in the specific example of the question why the generic version of the function should fall into this category. Maybe it is because the example is in Kotlin (which I have never worked with), maybe it is because it is too artificial and contrived. But I don't see really "less complexity to the algorithm" in the generic version. And for the non-generic version, I cannot say that it "distracts me by all the possibilities the special class".

So find a better example, and I may vote for the generic version. But not this time.

OTHER TIPS

There's has been some back and forth in the comments, and my feedback generally boils down to the same argument every time:

Your problem, as you describe it, always starts from the really unusual assumption that you don't know what it is you should be doing and would just interact with random available things just because they're available.

That means your development process is that of an unguided projectile. The solution does not come from a design pattern, it comes from refocusing yourself as a developer.

A developer who cannot be trusted to understand (and stick to) what they need to do, can also not be trusted to appropriately judge and implement a design pattern. Therefore, the source of your question renders the direct answer to your question moot.

Disclaimer: I'm not a kotlin dev, but I infer that the concept of generics is the same.

The generic version is exposing less complexity to the algorithm.

This doesn't quite make sense to me. When comparing generic and non-generic class that are otherwise equivalent, then then manipulated object will be manipulated the same way, regardless of whether its type is concrete or generic. If it's not the same, then the classes are not equivalent.

But to me personally, it seems, the generic version is more readable because I'm not distracted by all the possibilities the special class has, but are not used.

Readability can be lowered by either being too vague (abstract) or too specific (concrete). What is the most readable is incredibly contextual and cannot be answered universally.

But I do want to point out that, contrary to your assertion, additional abstractions can most definitely complicate a codebase.

Should one prefer a generic version of a function, even if it's not re-used (yet)?

While I admit this is an oversimplified response, if one should always prefer a generic version, then you would effectively need to make every property/field in every class (or method) generic and you essentially throw out the concept of having any hardcoded type in any class (or method signature), other than the class type (or method return type) itself.

Clearly, that is overkill, and would be immensely unreadable to boot.

Though it seems tautological, generics should be favored where appropriate to do so.

A simple example is a list, whose operations wholeheartedly do not care about the content of the elements themselves, and therefore it is clear from the start that generics are desirable.

In short, generics make sense when you know for a fact that you do not care (nor will ever care) about the concrete type.

I'm unsure if kotlin has generic type constraint where you can pre-emptively limit your generic type to inherit from a given base class or implement a certain interface, but the reasoning on whether to use generics is the same even when using such a constraint.

because I'm not distracted by all the possibilities the special class has

This justification raises a question mark for me. There's an apparent incongruence between the problem and its solution.

Your question seems to be the digital equivalent of covering up the buttons in your car that you don't intend to use today. It's summer, so you won't use the seat heater today, and then you'd rather just wall off the seat heater buttons so you "don't get distracted by the possibility of using the seat heater".

I know that's a silly analogy, but it's essentially the same argument. It relies on an inherent inability to focus on the things that are relevant without actively removing everything else from view.

If you struggle handling an object without fully utilizing everything it has to offer, that would be problematic for you in many situations, not just cases where generics could "solve" (or rather hide) the issue for you.
Would you then also not advocate for favoring the base object type (object in C#, unsure what it is for kotlin) so you don't get "distracted" by being able to use a concrete class' features when you don't need them? That would lead to significant polymorphism abuse in your codebase, or needing to backtrack and refactor when you do realize that you need access to the more concrete type further down the line.
This can also lead to being oversensitive to interface segregation if you err on the side of actively removing access to any method/property you don't currently need. Just because an interface has more than one method doesn't mean that you must use all methods provided at all times. A calculator has many different operations but you don't have to use all of them for every calculation.

This isn't mean to critique you, but rather to help you identify the underlying source of your concern, and address it in a way that better fits the problem domain. Using generics to hide information seems to be a patchwork solution to a different problem, and patchwork solutions tend to either not solve a problem completely or have unintended further consequences to them.

I do think it depends. If you are the only developer who will touch the code, I would write a generic only if you can actually foresee to, say, greater than 50% probability, usage for multiple types. Otherwise, should that usage become necessary unexpectedly, you know how to rewrite your function as a generic and your unit tests should be able to validate the generic for your original as well as the new uses. So I'd retain the simple form.

If there are multiple developers, some of whom will be altering the code or augmenting it later, the question to ask is: "Even if I write it as a generic, are future developers even going to know the function is there?" I've seen situations in a multiple-developer environment involving a large code base, where developers have written new functions because they never even knew that an applicable one was already present. If you are sure they would know about it, then maybe write it as a generic.

Finally, it took me years ("OK, Boomer" — c'est moi) to learn that, when I wrote extra or more versatile code to to accommodate what I imagined might be future enhancements, those future enhancements rarely got made. The later enhancements that actually became necessary (say, due to customer demand) were almost never the ones I envisioned. "KISS" is really a good rule.

Do you know what exactly your function is going to be working with, or does it even matter?

In the cases of looking at various collections, like a list for example, knowing what the items are within the list may be completely irrelevant (as far as the list itself is concerned). For example, if you want to be able to sort the list, the only thing that would really matter is that the items given to the list can be compared, so this is a case where having a generic can be an advantage.

If, on the other hand, you know that you will only be working with one particular type of object/class/type/etc, then a generic could be a hindrance, because now other users could attempt to use what looks like a generic function that isn't actually generic.

What it actually does is not that important for this example

Where you define it is. Sweeping generic utility functions often have no benefit of being part of the class which often implements it. Try this exercise please... If you move the utility functions to a separate class and the question is actually more interesting. Keeping it part of Thing offers implied scope, which will make you scratch your head a month or a year from now wondering why you're using Thing to perform a utility upon Monkey.
What it does is equally important. The value of a generic is implementation-specific and as I'll explain below, not all generic functions conveniently "work with everything" like your example, see "... equals(...) which is part of every base object ..." below for context and fall-through.

Should one prefer a generic version of a function, even if it's not re-used (yet)?

The short:

Yes, generic is better.
Per point above, it makes a lot more sense to make it generic when the utility function lives elsewhere such as MyUtilities.countUniqueWithMapping() instead of inside Thing, which has little to do with this utility other than the fact that it's the only implementation you're using. If it's generic, it's a utility. Organize it as such! :)

However, not all generics are utility functions, so this subjective question deserves a subjective answer.

You're relying on unique() which relies on equals(...) which is part of every base object.. This is important as it suggests you've used anecdotal evidence to call it "simpler". In practice, it's often not simpler. I'll explain...
- On the contrary, take the generic approach to something like sorting an ArrayList and the comparator ends up being in Comparable, applied to ArrayList, which means that compareTo can require a bunch of typof and (Thing) casting, since sorting may be int age ..., weight, height or some property that isn't as easily anticipated as equals(...)
- Another (albeit minor) fall-through (same sorting use-case), is when an Objects sort calls toString() which invokes hashCode(), producing even stranger results that would be less likely when explicit classes are used. I say "minor" because if you're sorting on toString() your generic can still rely on each object fixing this use-case, but unobvious bugs cost time and we eventually learn to avoid them.
  
  In most scenarios, it's good to prefer generics as the code is more scalable so as long as you prepare or guard for the fall-throughs.
Last, is a question of explicitness. Whenever one codes to generics the program has to be OK with sending "any old Object" into that generic. This will affect stuff like code hinting as well (such as when your IDE recommends an auto-complete based on function signature).
- Let's say instead of Thing, you use AtomicReference<Thing>
- The generic will be OK with generic T being AtomicReference<Thing>
- Do you know off-hand if AtomicReference.equals(...) will work in this scenario and produce the same results as calling unique(...) on Thing or will do hashCode() and fail?
- To this point, the IDE recommends that you pass the AtomicReference, you do, without type detection, accidentally forget to call AtomicReference<Thing>.get(), you can fall into an unsuspecting type, since the utility function didn't mandate something explicit.
- This can be further guarded with more typeofs and casting.
  
  The fallthrough cases are endless and mistakes will be made so keep this in mind when over-genericing your code.

In summary, yes, prefer generics, but this will make a lot more sense when the utility function lives elsewhere, such as MyUtilities.countUniqueWithMapping().

P.S. You may even decide to private-ize such generics to reduce redundancy while offering convenience (Thing mandating) public methods. This would allow you to carefully guard for each use-case while writing the function body once and only once. This is often preferred when offering your API to others as it will help them from the same unobvious fall-throughs mentioned above.

In the meantime, I've learned more about the underlying principle, this decision can be based on, and also found a better minimal example. It can be found in this article ("Why a generic implementation can be the easier-to-understand solution"):

In general generics should be considered when you are dealing with a relatively large number of implementations/sub-types of certain family(class/interface), while you introduce a processing/organization layer that deals with the family as whole.

It is a good thing to go for if you are designing a framework and you have pinned the type families that it operates on. It can be somewhat useful as a prototyping technique, but in that case use it very carefully since the prototype may end up collapsing under the weight of the meta code added with the generics.

If you are working on a concrete project try to avoid parametrizing your code with generics. Support functions/classes are always ok to parametrize, but only as long as they fit into the existing type families and their interfaces. The moment you need to add an interface so that you can get a generic into a tool function you are probably overdoing it.

Generics are a slippery slope and I would generally recommend to skip them by default in opposite of capsulation(hide everything) which should be applied by default. I think I do understand what you mean by "because I'm not distracted by all the possibilities the special class has" - when you are on your own on a big project, it can get overwhelming. Even then, however you will find pretty fast that it is very easy to overuse the generics ending in weird delegated constructors and a convoluted type systems, not to mention the jungle of interconnected interfaces, and lisp style chains of brackets. It is very similar to design pattern abuse- sticking too much to a principle set, moves your codebase away from the problem it should solve towards a mess of good practices gone wrong because of good intentions applied in the wrong context.

In the case where you are stacking abstractions, generics can be helpful as a way to constrain the derived class and thus ensure that the implementation operates exactly on the type you specified and not the family of types described by the interface you would use if you skip on generics. This allows you to make the concrete class tightly connected to object it wraps/operates on, while maintaining the reusability of the implementation that comes with the inheritance. In a way you may say that the concrete implementation gets even more concrete, while the abstract layer can be even more abstract. This approach however does not act as a filter on the wrapped generic type and comes with some serious design considerations in terms of access control, construction and fragmentation not to mention the interface set that you would have to built if you want to access the classes that are parametrized with generics.

In any case if your code does not undergo regular reviews by colleagues, either plan for a mess and go for generics, or remain conservative, get a feel if(and how much) the code-base will take compile-time parametrization. Provided you are happy with it, plan and commit towards refactoring.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange