Efficiency of passing char by reference in C++

Question 1

If the function is defined in the same translation unit (and the prototype is just a forward declaration) then it doesn't matter, the compiler will most likely inline the function and you won't be able to tell the difference.

If the function is defined in another translation unit (external linkage) then the compiler generates a function call. Most calling conventions pass the first few parameters in registers, that would definitely be the case for the character or reference to the character. If you pass by value the compiler will load the character into the register for the first parameter, if you pass by reference the compiler will place the address of the character in the register for the first parameter, the called function will then load the character from that address. Which is more efficient? Probably passing by value, but in today's CPUs with out-of-order execution and multiple instructions dispatched every cycle the reality is you probably can't tell the difference.

Here's a simple c++ program to see what gcc generates on Linux:

extern char byvalue( char );
extern char byref( const char & );
int main( int argc, char * argv[] )
{
    char c = byvalue( argv[0][0] ) + byref( argv[0][1] );
    return c;
}

I compiled and looked at the generated code:

$ g++ -O3 param.cpp -c -o param.o
$ objdump -D param.o|less

Here's what the generated code for those two calls look like in function main - %rdi/%edi is the register for the first (and in this case only) parameter:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   53                      push   %rbx
   2:   48 89 f3                mov    %rsi,%rbx
   5:   48 83 ec 08             sub    $0x8,%rsp
   9:   48 8b 06                mov    (%rsi),%rax
   c:   0f be 38                movsbl (%rax),%edi     ; %edi is character
   f:   e8 00 00 00 00          callq  14 <main+0x14>  ; byvalue
  14:   48 8b 3b                mov    (%rbx),%rdi
  17:   89 c5                   mov    %eax,%ebp
  19:   48 83 c7 01             add    $0x1,%rdi       ; %rdi is address of character
  1d:   e8 00 00 00 00          callq  22 <main+0x22>  ; byref
  22:   48 83 c4 08             add    $0x8,%rsp
  26:   01 e8                   add    %ebp,%eax
  28:   5b                      pop    %rbx
  29:   0f be c0                movsbl %al,%eax
  2c:   5d                      pop    %rbp
  2d:   c3                      retq

As you can see the compiler generated code to either load the character

   c:   0f be 38                movsbl (%rax),%edi     ; %edi is character
   f:   e8 00 00 00 00          callq  14 <main+0x14>  ; byvalue

Or load the address of the character

  19:   48 83 c7 01             add    $0x1,%rdi       ; %rdi is address of character
  1d:   e8 00 00 00 00          callq  22 <main+0x22>  ; byref

Question 2

The fact is that you can't predict what it will look like after optimizations take place; the only thing that stays "fixed" is the semantics of the code, not how it is actually executed.

Question 3

Your both wrong.

A reference requires a pointer to the original object. Not an int. probably 64 bits.

A char is pushed onto the stack, not copied elsewhere in memory and with standard packing this is probably like an int also 64 bits.

The pointer in question has to be deference later to get the value of the reference pulling in an entire cache line of 64 bytes on most hardware, if its not already in cache from the call. You would need to pull in the same cache line to push it on the stack, so very little difference there. BUT if the char was stored in a register, then that could have been pushed on the stack without a cache line being read in.

And If your optimizing for speed, it could probably stay in the same register if it were not a reference. The smart compiler guys might see that your doing something stupid like passing a pod type by const reference and keep it in a register to make you look good, but you shouldn't always rely on the compiler guys making you look good.

Unless your worried that someone might accidentally change the value of the char inside this function then why are you passing it as a const ref?

Every compiler / platform is different, and sometimes the ref may cost more, but for pod types passing my value will never cost more than by reference.

So yeah, your both wrong.

Question 4

This is a sketch of what happens at the line fun (x); when you pass by reference:

void fun (char const & c) {use (c);}
...
fun (x);
[next line]

Put return pointer to [next line] onto the stack at let's say memory address A. It's 4 or 8 bytes.
Put p which is a pointer to x onto the stack at let's say memory address B. It's 4 or 8 bytes.
When using c, its memory location is [dereference [dereference B]].
Return to [dereference A].

And here is what happens at the line fun (x); when you pass by value:

void fun (char const c) {use (c);}
...
fun (x);
[next line]

Put return pointer to [next line] onto the stack at let's say memory address A. It's 4 or 8 bytes.
Put c which is a copy of x onto the stack at let's say memory address B. It's 1 byte.
When using c, its memory location is [dereference B].
Return to [dereference A].

The addresses A and B (relative to the top of the stack) are hard-coded into the binary executable produced by the compiler. The differences in steps 2 and 3 are size and single or double dereference, both in favor of passing by value.

That said, a modern compiler with optimizations turned on will probably optimize both programs above by inlining the function — if it is simple enough, producing the following:

---
---
When using c as parameter of fun, x is used instead.
---

Question 5

You're both wrong. There are no 1-byte stack slots. But when you pass a char by reference:

You have to compute its address. If it's static, that's constant. If it's in an object, you have to add an offset to the object's address. If it's on the stack, method-local, you have to add its stack frame offset to the current stack frame pointer. On an x86 that's all done with the LEA instruction, but are you on Intel hardware?
You then have to push the address.
Then, every time you use it in the target method you have to dereference it.

All this is many more memory references than just pushing the value onto the stack.

Whether it really matters in a non-trivial method is another question. And of course it is open to the compiler to compile it as pass-by-reference anyway in some circumstances.