Can gcc/g++ tell me when it ignores my register?

https://stackoverflow.com/questions/3500301

29-09-2019
|

Question

When compiling C/C++ codes using gcc/g++, if it ignores my register, can it tell me? For example, in this code

int main()
{
    register int j;
    int k;
    for(k = 0; k < 1000; k++)
        for(j = 0; j < 32000; j++)
            ;
    return 0;
}

j will be used as register, but in this code

int main()
{
    register int j;
    int k;
    for(k = 0; k < 1000; k++)
        for(j = 0; j < 32000; j++)
            ;
    int * a = &j;
    return 0;
}

j will be a normal variable. Can it tell me whether a variable I used register is really stored in a CPU register?

Solution

You can fairly assume that GCC ignores the register keyword except perhaps at -O0. However, it shouldn't make a difference one way or another, and if you are in such depth, you should already be reading the assembly code.

Here is an informative thread on this topic: http://gcc.gnu.org/ml/gcc/2010-05/msg00098.html . Back in the old days, register indeed helped compilers to allocate a variable into registers, but today register allocation can be accomplished optimally, automatically, without hints. The keyword does continue to serve two purposes in C:

In C, it prevents you from taking the address of a variable. Since registers don't have addresses, this restriction can help a simple C compiler. (Simple C++ compilers don't exist.)
A register object cannot be declared restrict. Because restrict pertains to addresses, their intersection is pointless. (C++ does not yet have restrict, and anyway, this rule is a bit trivial.)

For C++, the keyword has been deprecated since C++11 and proposed for removal from the standard revision scheduled for 2017.

Some compilers have used register on parameter declarations to determine the calling convention of functions, with the ABI allowing mixed stack- and register-based parameters. This seems to be nonconforming, it tends to occur with extended syntax like register("A1"), and I don't know whether any such compiler is still in use.

OTHER TIPS

With respect to modern compilation and optimization techniques, the register annotation does not make any sense at all. In your second program you take the address of j, and registers do not have addresses, but one same local or static variable could perfectly well be stored in two different memory locations during its lifetime, or sometimes in memory and sometimes in a register, or not exist at all. Indeed, an optimizing compiler would compile your nested loops as nothing, because they do not have any effects, and simply assign their final values to k and j. And then omit these assignments because the remaining code does not use these values.

You can't get the address of a register in C, plus the compiler can totally ignore you; C99 standard, section 6.7.1 (pdf):

The implementation may treat any register declaration simply as an auto declaration. However, whether or not addressable storage is actually used, the address of any part of an object declared with storage-class specifier register cannot be computed, either explicitly (by use of the unary & operator as discussed in 6.5.3.2) or implicitly (by converting an array name to a pointer as discussed in 6.3.2.1). Thus, the only operator that can be applied to an array declared with storage-class specifier register is sizeof.

Unless you're mucking around on 8-bit AVRs or PICs, the compiler will probably laugh at you thinking you know best and ignore your pleas. Even on them, I've thought I knew better a couple times and found ways to trick the compiler (with some inline asm), but my code exploded because it had to massage a bunch of other data to work around my stubbornness.

This question, and some of the answers, and several other discussions of the 'register' keywords I've seen -- seem to assume implicitly that all locals are mapped either to a specific register, or to a specific memory location on the stack. This was generally true until 15-25 years ago, and it's true if you turn off optimizing, but it's not true at all when standard optimizing is performed. Locals are now seen by optimizers as symbolic names that you use to describe the flow of data, rather than as values that need to be stored in specific locations.

Note: by 'locals' here I mean: scalar variables, of storage class auto (or 'register'), which are never used as the operand of '&'. Compilers can sometimes break up auto structs, unions or arrays into individual 'local' variables, too.

To illustrate this: suppose I write this at the top of a function:

int factor = 8;

.. and then the only use of the factor variable is to multiply by various things:

arr[i + factor*j] = arr[i - factor*k];

In this case - try it if you want - there will be no factor variable. The code analysis will show that factor is always 8, and so all the shifts will turn into <<3. If you did the same thing in 1985 C, factor would get a location on the stack, and there would be multipilies, since the compilers basically worked one statement at a time and didn't remember anything about the values of the variables. Back then programmers would be more likely to use #define factor 8 to get better code in this situation, while maintaining adjustable factor.

If you use -O0 (optimization off) - you will indeed get a variable for factor. This will allow you, for instance, to step over the factor=8 statement, and then change factor to 11 with the debugger, and keep going. In order for this to work, the compiler can't keep anything in registers between statements, except for variables which are assigned to specific registers; and in that case the debugger is informed of this. And it can't try to 'know' anything about the values of variables, since the debugger could change them. In other words, you need the 1985 situation if you want to change local variables while debugging.

Modern compilers generally compile a function as follows:

(1) when a local variable is assigned to more than once in a function, the compiler creates different 'versions' of the variable so that each one is only assigned in one place. All of the 'reads' of the variable refer to a specific version.

(2) Each of these locals is assigned to a 'virtual' register. Intermediate calculation results are also assigned variables/registers; so

  a = b*c + 2*k;

becomes something like

       t1 = b*c;
       t2 = 2;
       t3 = k*t2;
       a = t1 + t3;

(3) The compiler then takes all these operations, and looks for common subexpressions, etc. Since each of the new registers is only ever written once, it is rather easier to rearrange them while maintaining correctness. I won't even start on loop analysis.

(4) The compiler then tries to map all these virtual registers into actual registers in order to generate code. Since each virtual register has a limited lifetime it is possible to reuse actual registers heavily - 't1' in the above is only needed until the add which generates 'a', so it could be held in the same register as 'a'. When there are not enough registers, some of the virtual registers can be allocated to memory -- or -- a value can be held in a certain register, stored to memory for a while, and loaded back into a (possibly) different register later. On a load-store machine, where only values in registers can be used in computations, this second strategy accomodates that nicely.

From the above, this should be clear: it's easy to determine that the virtual register mapped to factor is the same as the constant '8', and so all multiplications by factor are multiplications by 8. Even if factor is modified later, that's a 'new' variable and it doesn't affect prior uses of factor.

Another implication, if you write

 vara = varb;

.. it may or may not be the case that there is a corresponding copy in the code. For instance

int *resultp= ...
int acc = arr[0] + arr[1];
int acc0 = acc;    // save this for later
int more = func(resultp,3)+ func(resultp,-3);
acc += more;         // add some more stuff
if( ...){
    resultp = getptr();
    resultp[0] = acc0;
    resultp[1] = acc;
}

In the above the two 'versions' of acc (initial, and after adding 'more') could be in two different registers, and 'acc0' would then be the same as the inital 'acc'. So no register copy would be needed for 'acc0 =acc'. Another point: the 'resultp' is assigned to twice, and since the second assignment ignores the previous value, there are essentially two distinct 'resultp' variables in the code, and this is easily determined by analysis.

An implication of all this: don't be hesitant to break out complex expressions into smaller ones using additional locals for intermediates, if it makes the code easier to follow. There is basically zero run-time penalty for this, since the optimizer sees the same thing anyway.

If you're interested in learning more you could start here: http://en.wikipedia.org/wiki/Static_single_assignment_form

The point of this answer is to (a) give some idea of how modern compilers work and (b) point out that asking the compiler, if it would be so kind, to put a particular local variable into a register -- doesn't really make sense. Each 'variable' may be seen by the optimizer as several variables, some of which may be heavily used in loops, and others not. Some variables will vanish -- e.g. by being constant; or, sometimes, the temp variable used in a swap. Or calculations not actually used. The compiler is equipped to use the same register for different things in different parts of the code, according to what's actually best on the machine you are compiling for.

The notion of hinting the compiler as to which variables should be in registers assumes that each local variable maps to a register or to a memory location. This was true back when Kernighan + Ritchie designed the C language, but is not true any more.

Regarding the restriction that you can't take the address of a register variable: Clearly, there's no way to implement taking the address of a variable held in a register, but you might ask - since the compiler has discretion to ignore the 'register' - why is this rule in place? Why can't the compiler just ignore the 'register' if I happen to take the address? (as is the case in C++).

Again, you have to go back to the old compiler. The original K+R compiler would parse a local variable declaration, and then immediately decide whether to assign it to a register or not (and if so, which register). Then it would proceed to compile expressions, emitting the assembler for each statement one at a time. If it later found that you were taking the address of a 'register' variable, which had been assigned to a register, there was no way to handle that, since the assignment was, in general, irreversible by then. It was possible, however, to generate an error message and stop compiling.

Bottom line, it appears that 'register' is essentially obsolete:

C++ compilers ignore it completely
C compilers ignore it except to enforce the restriction about & - and possibly don't ignore it at -O0 where it could actually result in allocation as requested. At -O0 you aren't concerned about code speed though.

So, it's basically there now for backward compatibility, and probably on the basis that some implementations could still be using it for 'hints'. I never use it -- and I write real-time DSP code, and spend a fair bit of time looking at generated code and finding ways to make it faster. There are many ways to modify code to make it run faster, and knowing how compilers work is very helpful. It's been a long time indeed since I last found that adding 'register' to be among those ways.

Addendum

I excluded above, from my special definition of 'locals', variables to which & is applied (these are are of course included in the usual sense of the term).

Consider the code below:

void
somefunc()
{
    int h,w;
    int i,j;
    extern int pitch;

    get_hw( &h,&w );  // get shape of array

    for( int i = 0; i < h; i++ ){
        for( int j = 0; j < w; j++ ){
            Arr[i*pitch + j] = generate_func(i,j);
        }
    }
}

This may look perfectly harmless. But if you are concerned about execution speed, consider this: The compiler is passing the addresses of h and w to get_hw, and then later calling generate_func. Let's assume the compiler knows nothing about what's in those functions (which is the general case). The compiler must assume that the call to generate_func could be changing h or w. That's a perfectly legal use of the pointer passed to get_hw - you could store it somewhere and then use it later, as long as the scope containing h,w is still in play, to read or write those variables.

Thus the compiler must store h and w in memory on the stack, and can't determine anything in advance about how long the loop will run. So certain optimizations will be impossible, and the loop could be less efficient as a result (in this example, there's a function call in the inner loop anyway, so it may not make much of a difference, but consider the case where there's a function which is occasionally called in the inner loop, depending on some condition).

Another issue here is that generate_func could change pitch, and so i*pitch needs to done each time, rather than only when i changes.

It can be recoded as:

void
somefunc()
{
    int h0,w0;
    int h,w;
    int i,j;
    extern int pitch;
    int apit = pitch;

    get_hw( &h0,&w0 );  // get shape of array
    h= h0;
    w= w0;

    for( int i = 0; i < h; i++ ){
        for( int j = 0; j < w; j++ ){
            Arr[i*apit + j] = generate_func(i,j);
        }
    }
}

Now the variables apit,h,w are all 'safe' locals in the sense I defined above, and the compiler can be sure they won't be changed by any function calls. Assuming I'm not modifying anything in generate_func, the code will have the same effect as before but could be more efficient.

Jens Gustedt has suggested the use of the 'register' keyword as a way of tagging key variables to inhibit the use of & on them, e.g. by others maintaining the code (It won't affect the generated code, since the compiler can determine the lack of & without it). For my part, I always think carefully before applying & to any local scalar in a time-critical area of the code, and in my view using 'register' to enforce this is a little cryptic, but I can see the point (unfortunately it doesn't work in C++ since the compiler will just ignore the 'register').

Incidentally, in terms of code efficiency, the best way to have a function return two values is with a struct:

struct hw {  // this is what get_hw returns
   int h,w;
};

void
somefunc()
{
    int h,w;
    int i,j;

    struct hw hwval = get_hw();  // get shape of array
    h = hwval.h;
    w = hwval.w;
    ...

This may look cumbersome (and is cumbersome to write), but it will generate cleaner code than the previous examples. The 'struct hw' will actually be returned in two registers (on most modern ABIs anyway). And due to the way the 'hwval' struct is used, the optimizer will effectively break it up into two 'locals' hwval.h and hwval.w, and then determine that these are equivalent to h and w -- so hwval will essentially disappear in the code. No pointers need to be passed, no function is modifying another function's variables via pointer; it's just like having two distinct scalar return values. This is much easier to do now in C++11 - with std::tie and std::tuple, you can use this method with less verbosity (and without having to write a struct definition).

Your second example is invalid in C. So you see well that the register keyword changes something (in C). It is just there for this purpose, to inhibit the taking of an address of a variable. So just don't take its name "register" verbally, it is a misnomer, but stick to its definition.

That C++ seems to ignore register, well they must have their reason for that, but I find it kind of sad to again find one of these subtle difference where valid code for one is invalid for the other.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow