Question

A few days back there was a discussion here about whether the expression

i = ++i + 1

invokes UB (Undefined Behavior) or not.

Finally the conclusion was made that it invokes UB as the value of 'i' is changing more than once between two sequence points.

I was involved in a discussion with Johannes Schaub in that same thread. According to him

i=(i,i++,i)+1 ------ (1) /* invokes UB as well */

I said (1) does not invoke UB because the side effects of the previous subexpressions are cleared by the comma operator ',' between i and i++ and between i++ and i.

Then he gave the following explanation:

"Yes the sequence point after i++ completes all side effects before it, but there is nothing that stops the assignment side effect overlapping with the side effect of i++.The underlying problem is that the side effect of an assignment is not specified to happen after or before the evaluation of both operands of the assignment, and so sequence points cannot do anything with regard to protecting this: Sequence points induce a partial order: Just because there is a sequence point after and before i++ doesn't mean all side effects are sequenced with regard to i.

Also, notice that merely a sequence point means nothing: The order of evaluations isn't dictated by the form of code. It's dictated by semantic rules. In this case, there is no semantic rule saying when the assignment side effect happens with regard to evaluating both of its operands or subexpressions of those operands".

The statement written in "bold" confused me. As far as I know:

"At certain specified points in the execution sequence called sequence points,all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place."

Since,comma operators also specify execution order the side effect of i++ have been cancelled when we reach the last i.He(Johannes) would have been right had the order of evaluation been not specified(but in case of comma operator it is well specified).

So I just want to know whether (1) invokes UB or not?. Can someone give another valid explanation?

Thanks!

Was it helpful?

Solution

The C standard says this about assignment operators (C90 6.3.16 or C99 6.5.16 Assignment operators):

The side effect of updating the stored value of the left operand shall occur between the previous and the next sequence point.

It seems to me that in the statement:

i=(i,i++,i)+1;

the sequence point 'previous' to the assignment operator would be the second comma operator and the 'next' sequence point would be the end of the expression. So I'd say that the expression doesn't invoke undefined behavior.

However, this expression:

*(some_ptr + i) = (i,i++,i)+1;

would have undefined behavior because the order of evaluation of the 2 operands of the assignment operator is undefined, and in this case instead of the problem being when the assignment operator's side effect takes place, the problem is you don't know whether the value of i used in the left handle operand will be evaluated before or after the right hand side. This order of evaluation problem doesn't occur in the first example because in that expression the value of i isn't actually used in the left-hand side - all that the assignment operator is interested in is the "lvalue-ness" of i.

But I also think that all this is sketchy enough (and my understanding of the nuances involved are sketchy enough) that I wouldn't be surprised if someone can convince me otherwise (on either count).

OTHER TIPS

I believe that the following expression definitely has undefined behaviour.

i + ((i, i++, i) + 1)

The reason is that the comma operator specifies sequence points between the subexpressions in parentheses but does not specify where in that sequence the evaluation of the left hand operand of + occurs. One possibility is between the sequence points surrounding i++ and this violates the 5/4 as i is written to between two sequence points but is also read twice between the same sequence points and not just to determine the value to be stored but also to determine the value of the first operand to the + operator.

This also has undefined behaviour.

i += (i, i++, i) + 1;

Now, I am not so sure about this statement.

i = (i, i++, i) + 1;

Although the same principals apply, i must be "evaluated" as a modifiable lvalue and can be done so at any time, but I'm not convinced that its value is ever read as part of this. (Or is there another restriction that the expression violates to cause UB?)

The sub-expression (i, i++, i) happens as part of determining the value to be stored and that sub-expression contains a sequence point after the storage of a value to i. I don't see any way that this wouldn't require the side effect of i++ to be complete before the determination of the value to be stored and hence the earliest possible point that the assignment side effect could occur.

After this sequnce point i's value is read at most once and only to determine the value that will be stored back to i, so this last part is fine.

i=(i,i++,i)+1 ------ (1) /* invokes UB as well */

It does not invoke undefined behaviour. The side effect of i++ will take place before the evaluation of the next sequence point, which is denoted by the comma following it, and also before the assignment.

Nice language sudoku, though. :-)

edit: There's a more elaborate explanation here.

I was confused in the beginning regarding Johannes'(litb) statement but he mentioned that in :

i = (i, ++i, i) +1

< Johannes>
If < a > is assignment, and is an increment. :s: is a sequence point,then the side effects can be sequenced as follows between sequence points: (i :s: i++< a ><n> :s: i) + 1. The value of the scalar i was changed twice between the first and second sequence point here. The order in which the assignment and the increment happens is unspecified, and since between them there is no sequence point, it is not even atomic with respect to each other.This is one allowed ordering permitted by the unspecified ordering of these side effects.

This is different to (i++, i++), because the evaluation order of the two subexpressions is from left to right, and at the sequence point between them, the increment of the previous evaluation shall be complete, and the next increment shall not have yet taken place. This enforces that there is no change of the value of i between two sequence points, which makes (i++, i++) valid
< /Johannes>

This made me think the sequence mentioned by litb is invalid because as per C99:

6.5.16.1 (2) In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.

i.e. the value of the right operand needs to be known before the assignment side effect (modification of the value stored in the object corresponding to the left operand)

6.5.17 (2) The left operand of a comma operator is evaluated as a void expression; there is a sequence point after its evaluation. Then the right operand is evaluated; the result has its type and value.

i.e. the rightmost operand of the comma operation needs to be evaluated to know the value and type of the comma expression (and the value of the right operand for my example).

So in this case, the 'previous sequence point' for the assignment side effect would, in effect, be the right-most comma operation. The possible sequence mentioned by Johannes is invalid.

Please correct me if I am wrong.


Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top