Pregunta

In lots (actually all I've ever used) of functional languages there is no distinction between a statement and an expression and the last value of each code block is the "return value" of the block. On the other hand languages not generally considered purely functional usually introduce this distinction.

As an example of what I'm talking about, the following python code prints None:

def foo():
    5 + 5
print(foo())

while the scheme code prints 10

(define (foo) (+ 5 5))
(display (foo))

Obviously I'm not interested in subjective answers of people who prefer one style to the other, but objective reasons.

To me it seems the distinction makes the grammar and implementation of the language more complicated (one less obvious example of this being the necessary exceptions in the c++ standard for templates and void types, or the introduction of "shortcut if statements", like ? in c-influenced languages) without a real benefit - but most likely there's a reason why even new, modern languages still have this distinction.

¿Fue útil?

Solución

Ubiquitous side effects.

If you are in a purely functional language, everything is an expression. Even "statements" which return something like () (possibly distinguished by their type, e.g. IO ().

However, the majority of programming languages by default permit effects anywhere or everywhere, so sequencing becomes key, and thus you bake in special syntax for ordering statements to the computer, often separated with semicolons.

This isn't the case for pure expressions, which can be evaluated in any order that preserves the expression semantics.

Side effecting actions are considered such special expressions that they get special syntax.

Otros consejos

First, let me say that I think you're asking two, maybe more, different questions: "Why are some expressions distinguished syntactically from others?" and "Why are the semantics for sequencing what they are?"

For your first question: The sense I get from the many things I've read is that statements are expressions, but a restricted class of expressions that cannot appear as subexpressions in all circumstances, e.g.,

x = 4
y = (x += 1)

The above python code will generate a syntax error because a statement appeared in a place where an (unrestricted) expression was expected. I associate statements with side-effects, with sequencing, and with the imperative style. I don't know if you consider programming style a subjective answer to your question (style itself certainly is subjective).

I'm very interested to hear others' takes on this question, too.

For the second question: Semantics are sometimes arbitrarily decided, but the aim is a reasonable semantics, with different language designers simply differing on what is most reasonable (or most expected). It surprised me to learn that if control reaches the end of a function body in Python, it returns None, but those are the semantics. Designers have to answer similar semantics questions like "What should the type of a while loop be?" and "What should the type of an if statement be if it doesn't have an else branch? and Where should such statements be allowed syntactically (issues can arise if such an if statement is the last statement in a sequence of statements)?"

The question is, "why do new languages still have statements and not expressions exclusively?", right?

Programming language designs address different problems, e.g.

  1. simple grammar,
  2. simple implementation,
  3. simple semantics

being among the more theoretical design goals and

  1. execution speed of resulting compiled code
  2. compilation speed
  3. resource consumption of executing programs
  4. ease of use (e.g. simple to read)

being among the more practical ones ...

These design goals have no clear cut definitions, e.g. a short grammar is not necessarily the one with the cleanest structure, so which one is simpler?

(considering your example)

For ease of use or code readability a language designer might require you to write 'return', in front of the value (or rather the expression) resulting from a function. This is a return statement. If you can leave out the 'return', it is still implied and it could still be considered as a return statement (it just would not be so obvious in the code). If it is considered as an expression, this implies substitution semantics, like e.g. Scheme, but probably not Python. From a syntactical stand point it makes sense to distinguish statements and expressions, where 'return' is required.

Looking at machine code (which I didn't do much, so I might be wrong) it seems to me there are only statements, no expressions.

E.g. your example:

ld r1, 5
ld r2, 5
add r3, r1, r2
ret r3

(I'm making this up, obviously)

So for people that like to think in terms of how a (von Neumann) CPU core actually operates, or who want to simplify compilation for such a target architecture, statements are the way.

There is also the particular 'evil' (as in non-functional) assignment statement. It is required for expressing terminating loops without recursion. According to Dijkstra, loops have simpler semantics than recursion (ref. E.W. Dijkstra, "A Discipline of Programming" 1976). A loop executes faster and consumes less storage than recursion. Unless your language optimizes for tail recursion (like Scheme).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top