Expression Versus Statement

https://stackoverflow.com/questions/19132

09-06-2019
|

Question

I'm asking with regards to c#, but I assume its the same in most other languages.

Does anyone have a good definition of expressions and statements and what the differences are?

Solution

Expression: Something which evaluates to a value. Example: 1+2/x
Statement: A line of code which does something. Example: GOTO 100

In the earliest general-purpose programming languages, like FORTRAN, the distinction was crystal-clear. In FORTRAN, a statement was one unit of execution, a thing that you did. The only reason it wasn't called a "line" was because sometimes it spanned multiple lines. An expression on its own couldn't do anything... you had to assign it to a variable.

1 + 2 / X

is an error in FORTRAN, because it doesn't do anything. You had to do something with that expression:

X = 1 + 2 / X

FORTRAN didn't have a grammar as we know it today—that idea was invented, along with Backus-Naur Form (BNF), as part of the definition of Algol-60. At that point the semantic distinction ("have a value" versus "do something") was enshrined in syntax: one kind of phrase was an expression, and another was a statement, and the parser could tell them apart.

Designers of later languages blurred the distinction: they allowed syntactic expressions to do things, and they allowed syntactic statements that had values. The earliest popular language example that still survives is C. The designers of C realized that no harm was done if you were allowed to evaluate an expression and throw away the result. In C, every syntactic expression can be a made into a statement just by tacking a semicolon along the end:

1 + 2 / x;

is a totally legit statement even though absolutely nothing will happen. Similarly, in C, an expression can have side-effects—it can change something.

1 + 2 / callfunc(12);

because callfunc might just do something useful.

Once you allow any expression to be a statement, you might as well allow the assignment operator (=) inside expressions. That's why C lets you do things like

callfunc(x = 2);

This evaluates the expression x = 2 (assigning the value of 2 to x) and then passes that (the 2) to the function callfunc.

This blurring of expressions and statements occurs in all the C-derivatives (C, C++, C#, and Java), which still have some statements (like while) but which allow almost any expression to be used as a statement (in C# only assignment, call, increment, and decrement expressions may be used as statements; see Scott Wisniewski's answer).

Having two "syntactic categories" (which is the technical name for the sort of thing statements and expressions are) can lead to duplication of effort. For example, C has two forms of conditional, the statement form

if (E) S1; else S2;

and the expression form

E ? E1 : E2

And sometimes people want duplication that isn't there: in standard C, for example, only a statement can declare a new local variable—but this ability is useful enough that the GNU C compiler provides a GNU extension that enables an expression to declare a local variable as well.

Designers of other languages didn't like this kind of duplication, and they saw early on that if expressions can have side effects as well as values, then the syntactic distinction between statements and expressions is not all that useful—so they got rid of it. Haskell, Icon, Lisp, and ML are all languages that don't have syntactic statements—they only have expressions. Even the class structured looping and conditional forms are considered expressions, and they have values—but not very interesting ones.

OTHER TIPS

I would like to make a small correction to Joel's answer above.

C# does not allow all expressions to be used as statements. In particular, only assignment, call, increment, and decrement expressions may be used as statements.

For example, the C# compiler will flag the following code as a syntax error:

1 + 2;

an expression is anything that yields a value: 2 + 2
a statement is one of the basic "blocks" of program execution.

Note that in C, "=" is actually an operator, which does two things:

returns the value of the right hand subexpression.
copies the value of the right hand subexpression into the variable on the left hand side.

Here's an extract from the ANSI C grammar. You can see that C doesn't have many different kinds of statements... the majority of statements in a program are expression statements, i.e. an expression with a semicolon at the end.

statement
    : labeled_statement
    | compound_statement
    | expression_statement
    | selection_statement
    | iteration_statement
    | jump_statement
    ;

expression_statement
    : ';'
    | expression ';'
    ;

http://www.lysator.liu.se/c/ANSI-C-grammar-y.html

An expression is something that returns a value, whereas a statement does not.

For examples:

1 + 2 * 4 * foo.bar()     //Expression
foo.voidFunc(1);          //Statement

The Big Deal between the two is that you can chain expressions together, whereas statements cannot be chained.

You can find this on wikipedia, but expressions are evaluated to some value, while statements have no evaluated value.

Thus, expressions can be used in statements, but not the other way around.

Note that some languages (such as Lisp, and I believe Ruby, and many others) do not differentiate statement vs expression... in such languages, everything is an expression and can be chained with other expressions.

For an explanation of important differences in composability (chainability) of expressions vs statements, my favorite reference is John Backus's Turing award paper, Can programming be liberated from the von Neumann style?.

Imperative languages (Fortran, C, Java, ...) emphasize statements for structuring programs, and have expressions as a sort of after-thought. Functional languages emphasize expressions. Purely functional languages have such powerful expressions than statements can be eliminated altogether.

Simply: an expression evaluates to a value, a statement doesn't.

Expressions can be evaluated to get a value, whereas statements don't return a value (they're of type void).

Function call expressions can also be considered statements of course, but unless the execution environment has a special built-in variable to hold the returned value, there is no way to retrieve it.

Statement-oriented languages require all procedures to be a list of statements. Expression-oriented languages, which is probably all functional languages, are lists of expressions, or in tha case of LISP, one long S-expression that represents a list of expressions.

Although both types can be composed, most expressions can be composed arbitrarily as long as the types match up. Each type of statement has its own way of composing other statements, if they can do that all. Foreach and if statements require either a single statment or that all subordinate statements go in a statement block, one after another, unless the substatements allow for thier own substatements.

Statements can also include expressions, where an expression doesn't really include any statements. One exception, though, would be a lambda expression, which represents a function, and so can include anything a function can iclude unless the language only allows for limited lambdas, like Python's single-expression lambdas.

In an expression-based language, all you need is a single expression for a function since all control structures return a value (a lot of them return NIL). There's no need for a return statement since the last-evaluated expression in the function is the return value.

Some things about expression based languages:

Most important: Everything returns an value

There is no difference between curly brackets and braces for delimiting code blocks and expressions, since everything is an expression. This doesn't prevent lexical scoping though: A local variable could be defined for the expression in which its definition is contained and all statements contained within that, for example.

In an expression based language, everything returns a value. This can be a bit strange at first -- What does (FOR i = 1 TO 10 DO (print i)) return?

Some simple examples:

(1) returns 1
(1 + 1) returns 2
(1 == 1) returns TRUE
(1 == 2) returns FALSE
(IF 1 == 1 THEN 10 ELSE 5) returns 10
(IF 1 == 2 THEN 10 ELSE 5) returns 5

A couple more complex examples:

Some things, such as some function calls, don't really have a meaningful value to return (Things that only produce side effects?). Calling OpenADoor(), FlushTheToilet() or TwiddleYourThumbs() will return some sort of mundane value, such as OK, Done, or Success.
When multiple unlinked expressions are evaluated within one larger expression, the value of the last thing evaluated in the large expression becomes the value of the large expression. To take the example of (FOR i = 1 TO 10 DO (print i)), the value of the for loop is "10", it causes the (print i) expression to be evaluated 10 times, each time returning i as a string. The final time through returns 10, our final answer

It often requires a slight change of mindset to get the most out of an expression based language, since the fact that everything is an expression makes it possible to 'inline' a lot of things

As a quick example:

 FOR i = 1 to (IF MyString == "Hello, World!" THEN 10 ELSE 5) DO
 (
    LotsOfCode
 )

is a perfectly valid replacement for the non expression-based

IF MyString == "Hello, World!" THEN TempVar = 10 ELSE TempVar = 5 
FOR i = 1 TO TempVar DO
(    
    LotsOfCode  
)

In some cases, the layout that expression-based code permits feels much more natural to me

Of course, this can lead to madness. As part of a hobby project in an expression-based scripting language called MaxScript, I managed to come up with this monster line

IF FindSectionStart "rigidifiers" != 0 THEN FOR i = 1 TO (local rigidifier_array = (FOR i = (local NodeStart = FindsectionStart "rigidifiers" + 1) TO (FindSectionEnd(NodeStart) - 1) collect full_array[i])).count DO
(
    LotsOfCode
)

A statement is a special case of an expression, one with void type. The tendency of languages to treat statements differently often causes problems, and it would be better if they were properly generalized.

For example, in C# we have the very useful Func<T1, T2, T3, TResult> overloaded set of generic delegates. But we also have to have a corresponding Action<T1, T2, T3> set as well, and general purpose higher-order programming constantly has to be duplicated to deal with this unfortunate bifurcation.

Trivial example - a function that checks whether a reference is null before calling onto another function:

TResult IfNotNull<TValue, TResult>(TValue value, Func<TValue, TResult> func)
                  where TValue : class
{
    return (value == null) ? default(TValue) : func(value);
}

Could the compiler deal with the possibility of TResult being void? Yes. All it has to do is require that return is followed by an expression that is of type void. The result of default(void) would be of type void, and the func being passed in would need to be of the form Func<TValue, void> (which would be equivalent to Action<TValue>).

A number of other answers imply that you can't chain statements like you can with expressions, but I'm not sure where this idea comes from. We can think of the ; that appears after statements as a binary infix operator, taking two expressions of type void and combining them into a single expression of type void.

Statements -> Instructions to follow sequentially
Expressions -> Evaluation that returns a value

Statements are basically like steps, or instructions in an algorithm, the result of the execution of a statement is the actualization of the instruction pointer (so-called in assembler)

Expressions do not imply and execution order at first sight, their purpose is to evaluate and return a value. In the imperative programming languages the evaluation of an expression has an order, but it is just because of the imperative model, but it is not their essence.

Examples of Statements:

for
goto
return
if

(all of them imply the advance of the line (statement) of execution to another line)

Example of expressions:

2+2

(it doesn't imply the idea of execution, but of the evaluation)

Statements are grammatically complete sentences. Expressions are not. For example

x = 5

reads as "x gets 5." This is a complete sentence. The code

(x + 5)/9.0

reads, "x plus 5 all divided by 9.0." This is not a complete sentence. The statement

while k < 10: 
    print k
    k += 1

is a complete sentence. Notice that the loop header is not; "while k < 10," is a subordinating clause.

Statement,

A statement is a procedural building-block from which all C# programs are constructed. A statement can declare a local variable or constant, call a method, create an object, or assign a value to a variable, property, or field.

A series of statements surrounded by curly braces form a block of code. A method body is one example of a code block.

bool IsPositive(int number)
{
    if (number > 0)
    {
        return true;
    }
    else
    {
        return false;
    }
}

Statements in C# often contain expressions. An expression in C# is a fragment of code containing a literal value, a simple name, or an operator and its operands.

Expression,

An expression is a fragment of code that can be evaluated to a single value, object, method, or namespace. The two simplest types of expressions are literals and simple names. A literal is a constant value that has no name.

int i = 5;
string s = "Hello World";

Both i and s are simple names identifying local variables. When those variables are used in an expression, the value of the variable is retrieved and used for the expression.

I prefer the meaning of statement in the formal logic sense of the word. It is one that changes the state of one or more of the variables in the computation, enabling a true or false statement to be made about their value(s).

I guess there will always be confusion in the computing world and science in general when new terminology or words are introduced, existing words are 'repurposed' or users are ignorant of the existing, established or 'proper' terminology for what they are describing

I am not really satisfied with any of the answers here. I looked at the grammar for C++ (ISO 2008). However maybe for the sake of didactics and programming the answers might suffice to distinguish the two elements (reality looks more complicated though).

A statement consists of zero or more expressions, but can also be other language concepts. This is the Extended Backus Naur form for the grammar (excerpt for statement):

statement:
        labeled-statement
        expression-statement <-- can be zero or more expressions
        compound-statement
        selection-statement
        iteration-statement
        jump-statement
        declaration-statement
        try-block

We can see the other concepts that are considered statements in C++.

expression-statements is self-explaining (a statement can consist of zero or more expressions, read the grammar carefully, it's tricky)
case for example is a labeled-statement
selection-statements are if if/else, case
iteration-statements are while, do...while, for (...)
jump-statements are break, continue, return (can return expression), goto
declaration-statement is the set of declarations
try-block is statement representing try/catch blocks
and there might be some more down the grammar

This is an excerpt showing the expressions part:

expression:
        assignment-expression
        expression "," assignment-expression
assignment-expression:
        conditional-expression
        logical-or-expression assignment-operator initializer-clause
        throw-expression

expressions are or contain often assignments
conditional-expression (sounds misleading) refers to usage of the operators (+, -, *, /, &, |, &&, ||, ...)
throw-expression - uh? the throw clause is an expression too

Here is the summery of one of the simplest answer I found.

originally Answered by Anders Kaseorg

A statement is a complete line of code that performs some action, while an expression is any section of the code that evaluates to a value.

Expressions can be combined “horizontally” into larger expressions using operators, while statements can only be combined “vertically” by writing one after another, or with block constructs.

Every expression can be used as a statement (whose effect is to evaluate the expression and ignore the resulting value), but most statements cannot be used as expressions.

http://www.quora.com/Python-programming-language-1/Whats-the-difference-between-a-statement-and-an-expression-in-Python

To improve on and validate my prior answer, definitions of programming language terms should be explained from computer science type theory when applicable.

An expression has a type other than the Bottom type, i.e. it has a value. A statement has the Unit or Bottom type.

From this it follows that a statement can only have any effect in a program when it creates a side-effect, because it either can not return a value or it only returns the value of the Unit type which is either nonassignable (in some languages such a C's void) or (such as in Scala) can be stored for a delayed evaluation of the statement.

Obviously a @pragma or a /*comment*/ have no type and thus are differentiated from statements. Thus the only type of statement that would have no side-effects would be a non-operation. Non-operation is only useful as a placeholder for future side-effects. Any other action due to a statement would be a side-effect. Again a compiler hint, e.g. @pragma, is not a statement because it has no type.

Most precisely, a statement must have a "side-effect" (i.e. be imperative) and an expression must have a value type (i.e. not the bottom type).

The type of a statement is the unit type, but due to Halting theorem unit is fiction so lets say the bottom type.

Void is not precisely the bottom type (it isn't the subtype of all possible types). It exists in languages that don't have a completely sound type system. That may sound like a snobbish statement, but completeness such as variance annotations are critical to writing extensible software.

Let's see what Wikipedia has to say on this matter.

https://en.wikipedia.org/wiki/Statement_(computer_science)

In computer programming a statement is the smallest standalone element of an imperative programming language that expresses some action to be carried out.

Many languages (e.g. C) make a distinction between statements and definitions, with a statement only containing executable code and a definition declaring an identifier, while an expression evaluates to a value only.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow