Question

I'd like to know if some kind of structure contains more than one primitive but its total size is less than or equal to size of a single cpu register like a 4-byte register, does it ever make sense for a compiler to put it in one of those 4-byte registers when passing it by value or reference to a function instead of making a copy of it on the callee stack or passing a pointer to it and in general when passing something more than a single primitive to a function like an array or an structure would passing in a cpu register ever come in handy?

sample of such structure:

struct sample{
 public:
  char char1;
  char char2;
};

sample of passing the structure to a function:

void someFunc(const sample input){
 //whatever
}
void someFunc(sample input){
 //whatever
}
void someFunc(sample & input){
 //whatever
}
void someFunc(const sample & input){
 //whatever
}
Was it helpful?

Solution

Yes. Many compilers have a special keyword or type attribute that you can use to specify that a structure should be passed in registers rather than on the stack. It is more common on processors that have many registers and deep pipelines, like the PowerPC, and can be a tremendous performance improvement in architectures where writing a value to memory and then reading it back again right away causes a pipeline stall.

Usually you would only use it for a struct that is the same size as a native register. In particular, it's useful on processors that have wide SIMD registers, which can pass 16 bytes at a time or more. That would let you pass (for example) a 4 dimensional vector (four floats) on one register. AMD's System V is an example of an x86 ABI that permits this.

A different example is GCC's d64_abi type attribute, which tells a PowerPC to pass a structure on registers where possible, rather than on the stack. (This is part of the Darwin ABI).

typedef struct {
    int          a;
    float        f;
    char         c;
} __attribute__ ((d64_abi)) Thingy;

Thingy foo( Thingy t );

In the case above, a call to Foo would pass the Thingy on one float register and two int registers, rather than writing it to the stack and reading it right back again. The return value comes back on registers in the same way.

I've never seen a compiler that does this automatically, without your telling it, but it's possible one exists.

OTHER TIPS

This is defined in the application binary interface (ABI) of your execution environment. The standard does not say anything about processor registers when a function is called, so it is legal to create an environment where small structs are packed into a single processor register.

For the reference part, they are very likely to be passed as pointers anyway, since when inside the called function the address of a reference is taken, it must resolve to the address of the referenced object.

On certain architectures (like i386, I know it's ancient, but that's what I grew up with ;) it certainly makes sense to pass it in a register, since pushing and popping from the stack take a lot more (say between 3-6 times more) CPU cycles as passing by register. So a compiler would do a good job optimizing for that.

I can imagine there are other architectures where it doesn't matter. Or if the registers are in use for other optimizations which yield more improvement, it doesn't make sense to use them for this.

What architecture are you using/targeting, or are you asking in general?

I think there's compilers that will pass PODs in registers, even if they are structs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top