Found it.
The reason was in the just following conditional operator. So the code looks like this:
for ( int i = a; ; ) {
// small amount of calculations, and conditional calls of continue;
if ( expression ) continue;
// calculations1
if ( expression2 ) {
// calculations2
}
// very big amount calculations, and conditional calls of continue;
}
The value of expression2 is almost always false. So I changed it like this:
for ( int i = a; ; ) {
// small amount of calculations, and conditional calls of continue;
// if ( expression ) continue; // don't need this anymore
// calculations1
if ( __builtin_expect( !!(expression2), 0 ) ) { // suppose expression2 == false
// calculations2
}
// very big amount calculations, and conditional calls of continue;
}
And have got desired 25% speed up. Even a little bit more. And behaviour no longer depends on the critical line.
I'm not sure how to explain it and can't find enough material on branch prediction.
But I guess the point is that calculations2 should be skipped, but compiler doesn't know about this and suppose expression2 == true by default. Meanwhile it suppose that in the simple continue-check
if ( expression ) continue;
expression == false, and nicely skips calculations2 as has to be done in any case. In case when under if we have more complicated operations (for example cout) it suppose that expression is true and the trick doesn't work.
If somebody knows materials, which can explain this behaviour without guesses I will be very glad to read and accept their answer.