When to use / not use syntactic sugar

https://softwareengineering.stackexchange.com/questions/403483

06-03-2021
|

سؤال

Currently I am working on a school project written in C#. Some teammates just started in C# and some are already familiar with C#. Today I had a discussion on whether to use syntactic sugar like this in our code:

private _someClass;
public SomeClass SomeClass
{
    set => _someClass = value;
}

// Equivalent to:

private _someClass;
public SomeClass SomeClass
{
    set 
    {
        _someClass = value;
    }
}

Or this:

random ??= new Random();

// Equivalent to:

if (random == null)
    random = new Random();

Reasons we discussed for not using syntactic sugar like this were:

It is hard to read in general
For someone coming from another language e.g. Java, it is harder to see what is going on in the code.

Are these valid reasons? Does this happen in other languages? Are there some measures to decide what is the "cleaner" way of coding?

المحلول

I disagree with

It is hard to read in general

especially to "in general". These language features may be hard to read for beginners when they see them the first time, but they were actually added to the language to make code more concise. So after one gets used to them (which should not last longer than using them half a dozen times) they should make the code more readable, not less.

For someone coming from another language e.g. Java, it is harder to see what is going on in the code.

Yes, but is your goal to program Java in C#, or to program C#?

When you decide to use a language, you will be better off learn the idioms of the language, especially the simple ones. When you work with real-world programs, you will encounter these idioms frequently and will have to deal with them, whether you like them or not.

Let me finally add, the ultimate measure for the readibility of your code is what your peer reviewer tells you. And whenever I am in the role of a reviewer who stumbles about a simple language idiom which is new to me, I usually take it as an occasion to learn something new, not as an occasion to tell the other devs what they should not use because I don't want to learn it.

نصائح أخرى

Syntactic Sugar is Good™. What is a while loop if not sugar for goto? let me tell you what it is: easier to read and less error prone. Yes, you can do less with while than with goto and conditionals... that does not make it not sugar. It is sugar all the way down (I mean abstractions). We are getting to a point where we are closer to writing what we intend the code to do rather than how to do it. And that is Good™.

"Easy to read code" is not the same as "familiar code".

Familiar code is code that you look at a glance and you say "pft, I know what that is, I have seen it a thousand times". Easy to read code is code that anybody who knows the language can understand independently of prior exposure or experience.

For example, this is familiar code:

int total = 0;
for (int index = 0; index < list.Count; index++)
{
    total += list[index].Amount;
}

This is easy to read code (Also less error prone and with less wtf per minute):

int total = list.Sum(item => item.Amount);

I think Java does something like this:

int total = list.stream().mapToInt(item -> item.getAmount()).sum();

You will never have an off-by-one error with one of these.

By the way, you may also consider Linq to be sugar:

var total = (from x in list select x.Amount).Sum();

You may also consider it declarative code. It expresses intent, instead of instructions.

By the way, beyond readability. As rule of thumb, the less code you have to write the better. It means less opportunities for error. It also means more productive developers.

Although this is marginal for some syntactic sugar, it is a decisive factor for some. For example, you do not have to write a foreach to iterate on a list, it is sugar – usually – for using an IEnumerator. Not always, the compiler can optimize the iteration, it does so for arrays.

You, of course, can do much more with a for loop or using IEnumerator instead of foreach. Similarly, you can do much more with goto. Yet, foreach will prevent you from an off by one error or similar, and the compiler will know it does not have to check bounds.

By the way, please do not try to do the equivalent of async/await yourself. At least not in any production code. Furthermore, have you seen await foreach? These could seem like a new capabilities, as it uses a lot of work on the part of the compiler. However the truth that you could do them without the help of the compiler... except the code to do them is horrible, very easy to get wrong.

If you understand lambda expression, you get that after => (-> in Java) comes code. Then lambda bodied members make sense. You are already learning to write properties, don’t you? You could just learn that this is the way you do it.

By the way, it is worth noting that by using expression bodied members, you are limiting the body to be an expression (and no statements). It is in the name. Thus, it is good that we get throw and switch expressions. Arguably a language does not need statements, there are languages like that.

Properties are sugar for getter and setter methods. Why is it ok to use properties? If anything is a new capability of all that I mention here, it is properties. They are the only way in C# to get a getter and setter to appear as a single member to the type system.

In defense of ??=...

We have this code:

if (random == null)
{
    random = new Random();
}

We could make it shorter by using ?:

random = random == null ? new Random() : random;

This sometimes is convenient, however it is not really better readability. Plus, we managed to write the variable three times instead of two.

Let us use ?? to reduce it further:

random = random ?? new Random();

This is clearer. Less error prone, still convenient. However, can we write our variable only once and thus avoid repeating ourselves?

Yes, with ??=:

random ??= new Random();

Hopefully this gives you some appreciation for this feature.

By the way, with the upcoming target-typed new expression, you could write:

random ??= new();

Which is easier to write, even if not necessarily easier to read (you may have to look elsewhere to know the type).

Aside, did you know that lambda expressions (not to be confused with expression bodied members) are sugar? They make an anonymous class which exposes a method with the signature needed. All captured variables become fields of the anonymous class. Of course, lambda expressions have very limited capabilities to create classes. However, they save you a lot of work, a lot of horrible code.

Java folks who wrote listeners in Java 5 should remember that it is easier to implement them all in your class instead of making a class for each listener and figure out how to keep encapsulation. Then they got anonymous classes, and afterwards lambda expressions. And the horrible code that it would have been in Java 5 became easy to write, easy to read, less error prone... however, it is still making a class under the hood.

And we say those are functional features. Yeah, we add less capable syntax and call it adding a paradigm. Paradigms are sets of restrictions shown to be good. At least according to Robert "Uncle Bob" Martin, in the book Clean Architecture, paradigms provide constraints. We of course have to make it sugary, so that the right thing to do is the easy thing to do.

Java took a lot of syntax from C and C++. Is knowing C++ an excuse to not learn Java? I do not think so. Similarly knowing Java should not be an excuse to not learn C#. Even if some Java folks continue to claim they are the same language. If you are going to ignore the differences, you are going to ignore the differences. The first rule of Tautology Club is the first rule of Tautology Club.

Although, I am not saying that you should use all the sugar, much less enforce the sugar. You need to reach an agreement on what is considered good style for your project (you know, naming conventions and stuff). And that might include that you are not going to use expression bodied methods, for example. Not because you know Java, but because that is what you agree on.

Although other answers implicitly touch upon this, the first thing to consider when asking "How readable is this code?" is, Who is the audience for this code?

Here are several potential audiences for your code:

Developers who are not C# programmers and do not intend to learn C#. For these people, write in a way that is common to many languages at least one of which the audience would be expected to know. Generally this means avoiding syntactic sugar. It might also mean avoiding pattern matching, list comprehensions, and the like, if those are not common in your audience's most-used languages. This also applies to programming techniques: if your audience uses languages that generally use only for/while/etc. loop structures for iteration, you probably want to avoid recursion and generators, even when those are significantly simpler ways of doing things for those in the know.
Developers who don't know C# well but intend to learn it. Here you'd want to write idiomatic C#, using all the non-obscure built-in syntactic sugar, but also add comments explaining how the syntactic sugar works, at least in the places where someone's likely to start reading. You might also take this approach if your audience is mixed between learners and the non-learners above; which way you do it would depend on the expected proportions between the two and how much time people will spend with the code. It would also depend on whether people are just reading the code or they or you are frequently modifying it; if the latter you probably want to tend toward using the language-specific idioms because that will make for easier writing of code.
Developers who know C# but don't well know your particular project. Idioms, including syntactic sugar, are often domain- or project-specific. Heavy use of these can make reading (and writing) code faster and easier if the developers are already familiar with those idioms, but the domain and project idioms can be a large barrier for those not familiar with them, even if they're familiar with the language itself. You'd want to stick to techniques obvious to everyone if you expect to have a lot of developers that haven't and won't spend a lot of time with your particular project. (This might be true of open-source software where you expect to need many smallish contributions from a broad range of developers, or on a project you know has a high developer churn rate.)
Developers who well know your particular project. Typically this happens in commercial development groups with full-time staff working for months or years on the same codebase. Here you should go all out with developing and using idioms that communicate as quickly and as effectively as possible within the group, even at the expense of a longish learning process for new developers. (This can be mitigated with pair programming, which can also be a very productive tool when you need external expertise for just a short time.)

The core lesson here is that code isn't easy or hard to read in and of itself: it's only easy or hard to read in a particular context. The same code that's easier to read for developers with little knowledge of a particular language or project might be significantly harder to read for developers with extensive experience with that project.

(A corollary of the above is that you should not impose a "standard" set of "this is how it's easier to read" rules on a project with mostly long-term developers because you'll slow them down by doing this. Instead, let them develop whatever they need to make that particular project easier to read for them, regardless of what the standard rules say.)

My favourite example of "clarity" depending on context is quadratic equations. These days anybody who still remembers high-school math finds ax² - bx + c = 0 quite easy to read and understand, but this would have mystified Muhammad ibn Musa al-Khwarizmi, the Persian scholar who developed the techniques we use to solve these equations. There was nobody in the world from that era who would have found our current notation anywhere as easy to understand as just using "plain language", as al-Khwarizmi wrote it:

If some one says: "You divide ten into two parts: multiply the one by itself; it will be equal to the other taken eighty-one times." Computation: You say, ten less a thing, multiplied by itself, is a hundred plus a square less twenty things, and this is equal to eighty-one things. Separate the twenty things from a hundred and a square, and add them to eighty-one. It will then be a hundred plus a square, which is equal to a hundred and one roots. Halve the roots; the moiety is fifty and a half. Multiply this by itself, it is two thousand five hundred and fifty and a quarter. Subtract from this one hundred; the remainder is two thousand four hundred and fifty and a quarter. Extract the root from this; it is forty-nine and a half. Subtract this from the moiety of the roots, which is fifty and a half. There remains one, and this is one of the two parts.

(I leave it as an exercise for the reader to translate the above into modern mathematical notation.)

There are two kinds of sugar in this example: replacing a pair of braces containing 1 line with a lambda expression, and replacing a condition of the form x = x o y for a binary operator o with x o= y. These are widely used in many other examples, across countless mainstream languages; it'd actually be more conspicuous not to use them here, if you would use them elsewhere. In particular:

Treating 1-line blocks differently just because you can is contentious, but if the rest of your codebase/team does it a lot - as I like to - you shouldn't treat it differently when it's a setter.
o is, admittedly, more likely to be +, -, * or / than //, &, | or ??, but it happens.

Knowing what x ?? y means, recognising the o= pattern, and knowing what => does, may well not be universal among good programmers. But I don't think you should code for people who, while charged with maintaining others' code, don't pick up such things quickly. You should be coding for people who, let's be honest, would in many cases be itching to use sugar if you haven't beaten them to it.

To be fair, only so many languages use ?? or =>. A fair few .NET languages have them, whereas scripting languages such as Python and R are another story. But languages are chosen for their specific advantages, and that includes their idioms. I haven't used C# in a while: I miss how useful ?? can be. You should no more eschew it than you would a convenient way your language handles OOP that others often don't.

A final note:

In languages that support getters and setters, they often aren't needed at all because a default behaviour for them is sought. In some other cases, the getter and setter is surprisingly complicated, for whatever reason, and needs so many lines (like, 2?) that you have to include braces. But what you have here is an intermediate case where you can't just write get; set;, but the next simplest thing has happened. My opinion is that such "explicit but obvious" getters and setters shouldn't distract, and to be that unobtrusive they should do what they can: be a lambda expression. Again, though, with opinions it's more important to be consistent with existing practices.

tl;dr– Probably best to use the syntactic sugar.

Reasons:

Using syntactic sugar is like using acronyms, jargon, shorthand, implicit descriptors, etc., in that it's a situationally dependent trade-off.
Language designers need to thoroughly consider the trade-off; typically, programmers ought to just use language features to stick with idiomatic syntax unless they have cause to introduce to a stylistic variation.
Some syntactic sugar is more conceptually precise because it expresses what is intended rather than how to do it.

1: Syntactic sugar is like acronyms, jargon, implicit descriptors, etc..

Seems like syntactic sugar's a larger issue than just in programming. For example, when should we use an acronym vs. spell something out? Or when should we use jargon vs. more descriptive language? Or when should we state something explicitly vs. rely on the reader inferring something implicit?

In general, it's best to spell stuff out, avoid jargon, etc., as a default. Then, we simplify when it starts to seem worthwhile. There're judgement calls to be made in that.

2: Language designers ought to worry about syntactic sugar; language users can just go with it as they're downstream of the decision-making.

For language designers, e.g. the folks behind C#, there's probably a question of when to add syntactic sugar to the language; at what point is some common coding structure prevalent enough to justify additional language complexity?

For language users, e.g. C# programmers, it seems like there's less to debate as decisions about such language features are already tackled by the language designers. Choosing to introduce additional rules on top of the basic language, e.g. "Don't use ??=.", would add cognitive complexity.

In the absence of some compelling reason to introduce a style rule like avoiding ??=, it'd seem best to just use it where appropriate, as its existence within the modern C# language has already been decided. Even if this decision were to later be reversed, it wouldn't seem at all difficult to have an IDE-based tool automatically refactor it.

3: Some syntactic sugar is more conceptually precise.

Consider:

The mailbox at 123 ABC Street had its flag up, so it had mail to be picked up.
The mailbox at 123 ABC Street had its flag up, so the mailbox at 123 ABC Street had mail to be picked up.

Both statements are true, and the first is more concise. So, that's one reason to favor the first statement.

But beyond being more concise, the first statement is more clear about the concept that, if a mailbox's flag is up, then it has mail that needs to be picked up. The second statement requires the reader to infer that the second reference to "the mailbox at 123 ABC Street" is a direct equality to arrive at this same conclusion. And while that's not a difficult inference to make, the second statement is still less explicit for it.

For a real-world programming scenario, this StackOverflow question included the code:

private List<Foo> parseResponse(Response<ByteString> response) {
    if (response.status().code() != Status.OK.code() || !response.payload().isPresent()) {
      if (response.status().code() != Status.NOT_FOUND.code() || !response.payload().isPresent()) {
        LOG.error("Cannot fetch recently played, got status code {}", response.status());

, where they kept referring to response.status() rather than caching it, e.g.

private List<Foo> parseResponse(Response<ByteString> response) {
    var responseStatus = response.status();
    if (responseStatus.code() != Status.OK.code() || !response.payload().isPresent()) {
      if (responseStatus.code() != Status.NOT_FOUND.code() || !response.payload().isPresent()) {
        LOG.error("Cannot fetch recently played, got status code {}", responseStatus);

Then, of course, other values also aren't cached.

Apparently they were comfortable with the assumption that response.status() would always return the same value across each call. And maybe that was a good, solid assumption in their use case. Or maybe it wasn't; dunno.

Regardless, my point's that that assumption existed; that the code is functionally different depending on whether or not that assumption holds.

This has been an issue with C# before. For example, this StackOverflow question looks at a similar issue with C# events. Consider:

public class A
{
    public delegate void OnSomethingHappeningHandler(A sender, object eventArguments);
    public event OnSomethingHappeningHandler OnSomethingHappening;

    protected Fire_OnSomethingHappening(object eventArguments)
    {
        // ... event-firing code; see versions below ...
    }
}

Versions of the event-firing code:

```
this.OnSomethingHappening(this, eventArguments);
```
This is bugged because the event-handler might be null.
```
if (this.OnSomethingHappening != null)
{
    this.OnSomethingHappening(this, eventArguments);
}
```
This is also bugged. While it may check for the event-handler being null, it gets the event-handler twice. In multi-threaded scenarios, the event-handler can become null before the second reference.
- This is like the issue with response.status() in the prior example.
```
var handler = this.OnSomethingHappening;
if (handler != null)
{
    handler(this, eventArguments);
}
```
This is the first non-bugged version. It uses a temporary variable, handler, to avoid getting the event-handler twice, avoiding the inconsistency in the prior version.
- Temporary variables like handler are like programming pronouns. For example, "it" was a temporary variable for "the mailbox at 123 ABC Street".
```
this.OnSomethingHappening?.Invoke(this, eventArguments);
```
This syntax was introduced in C# 6. It means the same thing as above, but it's more concise.
- Conveniently, being more concise makes it easier to sell to people who don't understand the problem with the bugged version in (2); we can tell them to just go with it because it's more concise without having to explain why it's also better.

Point here is that, if we're comparing

protected Fire_OnSomethingHappening(object eventArguments)
{
    if (this.OnSomethingHappening != null)
    {
        this.OnSomethingHappening(this, eventArguments);
    }
}

vs.

protected Fire_OnSomethingHappening(object eventArguments)
{
    this.OnSomethingHappening?.Invoke(this, eventArguments);
}

, then the syntactic sugar isn't just more concise, but it's also more conceptually precise.

The funny issue behind conceptual precision is that it often doesn't matter until it does. For example, Version (2) of the event-firing method above isn't problematic in single-threaded scenarios; the conceptual looseness of it was purely academic until we did get into multi-threaded scenarios.

This same sort of conceptual distinction comes up in:

if (random == null)
{
    random = new Random();
}

vs.

```
random ??= new();
```

The first version is conceptually looser than the second, because:

The code has two references – not because it intends to reference two variables, but because it wants to refer to the same variable twice.
The code locally asserts that random is of type Random. Not because this matters – I mean, I think it basically intends to say
```
if (random == null)
{
    random = new typeof(random)();
}
```
, where typeof(random) being Random is a happenstance tangential to the programmer's intent.

IDE's like Visual Studio allow users to tap Ctrl+R twice to trigger a rename-variable prompt; then, the user can type in a new variable name that targets the conceptual variable rather than a specific textual reference to it.

Such IDE-level features are great! But my point's that the IDE-level features are workarounds to the inadequacy of the source code to concisely represent the intent. For example, Ctrl+Rx2 is a workaround for needing to redundantly type the same reference name.

Anyway, my point's that code like

if (random == null)
{
    random = new Random();
}

fails to precisely capture the programmer's intent. So even when this failure to capture the programmer's intent doesn't rise to the level of a behavioral difference, syntax like

random ??= new();

has the additional advantage of being more precise (rather than merely being more concise). And this preciseness can rise to the level of a behavioral difference, with the more precise syntax avoiding bugs, such as in the case of C# event-firing code (as shown above).

Conclusion: Probably just use syntactic sugar when it's available.

While there's a judgement call to be made in creating syntactic sugar, that judgement call seems like an issue for the language designers rather than the language users. Once syntactic sugar's a part of the language, it'd seem more consistent to just use it unless there's cause for a stylistic deviation from the standard language's idioms.

When it comes to coding style, everyone on the team needs to take time to learn the idioms of your language. Idiomatic code is expressed differently whether you are using Ruby, Python, C#, Java, C++, or Go. Part of that is due to some of the ways the language expresses itself, but a lot of that is established in the way the community around that language has determined what Good Code (TM).

So the question remains when is it OK to deviate from idiomatic code? (examples)

Honestly, there is only a couple real reason to deviate from the idioms:

The idiom makes it harder to understand than if your wrote the code long form
You need different guarantees than the idiom provides

An example can be with C# Linq statements. You can write them in two different ways, and one can be clearer than the other. The more you try to cram into one Linq statement, the harder it is to read or understand.

Most of the time, when you and your team understand the language idioms, it improves the ability to track what's happening in the code. Since you are used to the language short-hands, you don't have to be distracted with the extra syntax and can focus on the actual logic.

Typically, idioms become established because they play to the programming language's strengths and avoid subtle errors.

Clean Code

What is clean?

Do you get out a bottle of Mr Clean, spray your code down, and wipe it off?

Lets actually break this down to what you actually want.

The Actual Problem

From your points I roughly see:

Understandable by your team members
Readable without needing to do mental judo

Essentially you want to arrive at an agreement with your team about:

what syntax is allowed,
what architectural grammars (design patterns) are allowed.

That is pretty simple to do:

We are using C# version X.Y
We are not using these particular syntactic forms: -> properties, etc...
- It is best to keep this list short/non-existent.
We are going to use Object-Orientated/Logic/Functional/Stream/Predicate/... paradigms.
We are familiar with, and can just use these Design Patterns: Fly Weight, Factory, MVC, ...
We agree to discuss unusual patterns/syntax with the team before using them in the project.

The We needs to be the Team, not the leaders, not Gary off to the side, not the Business.

Similarly the We will need to agree on things like Source Control, Editors, Task breakdown, etc...

If it's valid syntax, it's not sugar, and you can can use it.

Certainly this line:

random ??= new Random();

Can be hard to grok at first glance. But if we google "C# ??= operator", we get the full low down here as the top returned search result:

https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/null-coalescing-operator

If your worried about if the code is hard to read:

Conduct Peer Reviews
Use tooling/linting to promote and modify syntax to your standards

There are lots of code paths to write equivalent solutions in code.

I use the following heuristic:

The length of a piece of code should reflect how relevant and how complex it is.

Explain the algorithm to a fellow programmer. If you use one sentence to explain one step, that step is straightforward. Try to express that in as few lines as possible. If you explain a part in many sentences, write it in as many lines of code, even if syntactic sugar allows it to be written in one.

If you can't write a single-sentence step in one like, abstract it out into a function (i.e. write your own syntactic sugar).

If I read a function body, all the setup, data munging, handling edge cases, etc. is not the main purpose of the function. This can be communicated to the reader by keeping it short. Syntactic sugar helps to write things in a concise way (add some comments if it becomes unclear). What you're saying to the reader is, this code is just doing the expected setup.

Then, when it comes to the actual algorithm you're implementing, the heart of the function, you want the reader to take their time. This code is likely complex, and it is important to understand it well, especially if they'll be manipulating it. Here, you should write your code so that one line represents one logical step, and avoid the temptation to combine multiple steps even if you can.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى softwareengineering.stackexchange