Why distinction between expression and statement

Question 1

Ubiquitous side effects.

If you are in a purely functional language, everything is an expression. Even "statements" which return something like () (possibly distinguished by their type, e.g. IO ().

However, the majority of programming languages by default permit effects anywhere or everywhere, so sequencing becomes key, and thus you bake in special syntax for ordering statements to the computer, often separated with semicolons.

This isn't the case for pure expressions, which can be evaluated in any order that preserves the expression semantics.

Side effecting actions are considered such special expressions that they get special syntax.

Question 2

First, let me say that I think you're asking two, maybe more, different questions: "Why are some expressions distinguished syntactically from others?" and "Why are the semantics for sequencing what they are?"

For your first question: The sense I get from the many things I've read is that statements are expressions, but a restricted class of expressions that cannot appear as subexpressions in all circumstances, e.g.,

x = 4
y = (x += 1)

The above python code will generate a syntax error because a statement appeared in a place where an (unrestricted) expression was expected. I associate statements with side-effects, with sequencing, and with the imperative style. I don't know if you consider programming style a subjective answer to your question (style itself certainly is subjective).

I'm very interested to hear others' takes on this question, too.

For the second question: Semantics are sometimes arbitrarily decided, but the aim is a reasonable semantics, with different language designers simply differing on what is most reasonable (or most expected). It surprised me to learn that if control reaches the end of a function body in Python, it returns None, but those are the semantics. Designers have to answer similar semantics questions like "What should the type of a while loop be?" and "What should the type of an if statement be if it doesn't have an else branch? and Where should such statements be allowed syntactically (issues can arise if such an if statement is the last statement in a sequence of statements)?"

Question 3

The question is, "why do new languages still have statements and not expressions exclusively?", right?

Programming language designs address different problems, e.g.

simple grammar,
simple implementation,
simple semantics

being among the more theoretical design goals and

execution speed of resulting compiled code
compilation speed
resource consumption of executing programs
ease of use (e.g. simple to read)

being among the more practical ones ...

These design goals have no clear cut definitions, e.g. a short grammar is not necessarily the one with the cleanest structure, so which one is simpler?

(considering your example)

For ease of use or code readability a language designer might require you to write 'return', in front of the value (or rather the expression) resulting from a function. This is a return statement. If you can leave out the 'return', it is still implied and it could still be considered as a return statement (it just would not be so obvious in the code). If it is considered as an expression, this implies substitution semantics, like e.g. Scheme, but probably not Python. From a syntactical stand point it makes sense to distinguish statements and expressions, where 'return' is required.

Looking at machine code (which I didn't do much, so I might be wrong) it seems to me there are only statements, no expressions.

E.g. your example:

ld r1, 5
ld r2, 5
add r3, r1, r2
ret r3

(I'm making this up, obviously)

So for people that like to think in terms of how a (von Neumann) CPU core actually operates, or who want to simplify compilation for such a target architecture, statements are the way.

There is also the particular 'evil' (as in non-functional) assignment statement. It is required for expressing terminating loops without recursion. According to Dijkstra, loops have simpler semantics than recursion (ref. E.W. Dijkstra, "A Discipline of Programming" 1976). A loop executes faster and consumes less storage than recursion. Unless your language optimizes for tail recursion (like Scheme).