Going through AVR assembler "hello world" code

Question 1

The dot/period is used as a shortcut to indicate this instruction's address or location or something relative to that. .+8 means from here plus 8. You have to account for the nuances of the instruction set and/or assembler relative to the instruction set. As the additional information from the assembler indicates, the .-8 is going to do_clear_bss_loop which is eight bytes back including the two bytes for the instruction itself. The original code probably just had the label in there, brne do_clear_bss_loop.

It is likely copying the data segment; .text is basically read-only. It is your code and it wants to live in flash on this platform. .data, though, is read/write and usually initialized to non-zero values. So with the power off, your initial values need to be preserved somewhere, in flash for example, but before you start your real program the bootstrap will need to copy the initial .data segment values from flash to their actual home in RAM. Then as the program runs, it can read and/or modify those values as desired.

For example:

int x = 5;

main ()
{
    x = x + 1;
}

That value 5 has to be in flash in order to start from power up only using flash to hold non-volatile information. But before you can read/write the memory location for x you need it in RAM, so some startup code copies all of the .data sgement stuff from flash to RAM.

Sorry for that long explanation for something that is only a guess looking at your question.

.bss are variables in your program that are initialized to zero. With the .data segment, if we had 100 items we would need 100 things in flash. But with .bss if we have 100 items we only need to tell someone that there are 100 items. We don't need 100 zeros in flash, just compile/assemble it into the code.

So

int x = 5;
int y;

int main ()
{
    while(1)
    {
        y = y + x + 1;
    }
}

x is in .data and the 5 needs to be in non-volatile storage. The y is in .bss and only needs to be zeroed before main is called to comply with the C standard.

Granted, you may not be using global variables yourself, but there may be other data that is in some way using the .data and/or .bss segments and as a result the bootstrap code prepares the .data and .bss segments before calling main() so that your C programming experience is as expected.

Question 2

I realize this is a late answer. However, I still think it may be interesting to have a detailed point-by-point answer to all the questions.

What is the .-8 or alike syntax? (address 0x98 or 0xAA for instance.)

It means: "jump back 8 bytes from here". Beware that the program counter has already been incremented by the length of the instruction (2 bytes), thus brne .-8 will move you 6 bytes (not 8) prior to the brne instruction itself. In the same vein, rcall .+0 will push the program counter to the stack without altering the program flow. This is a trick only intended to reserve two bytes of stack space in a single instruction.

Around lines with address 80 to 88 (end of __do_copy_data) there are some funny things. It seems to me that this loads all the program code into RAM, from address 0xC4. Why?

No, nothing is copied, this is an empty loop. On lines 84 to 88 there is a test that exits the loop when the pointer X (r27:r26) equals 0x0100. Since X is initialized to 0x0100, this will not loop at all.

This loop is intended to copy the data section from flash to RAM. It does basically something like this:

X = DATA_START;  // RAM address
Z = 0x00C4;      // Flash address
while (X != DATA_START + DATA_SIZE)
    ram[X++] = flash[Z++];

but your program happens to have an empty data section (DATA_SIZE == 0 in the above pseudo-code).

Also, you should note that your program ends at address 0x00c3, thus the Z pointer is initialized to point right after the program code. This is where the initial values of the initialized variables are supposed to be.

In __do_clear_bss_start/loop, we clear all the work we have just done by setting bytes in the RAM to 0 (value of r1). Why? All this to finally call main. Any general explanations?

No, nothing will be overwritten. This loop clears the BSS, which normally comes right after the data section, with no overlap. Pseudocode:

X = BSS_START;
while (X != BSS_START + BSS_SIZE)
    ram[X++] = 0;

where BSS_START == DATA_START + DATA_SIZE. This is also an empty loop in your program because you have an empty bss.

Why doesn't disasembling show .bss, .rodata or other sections?

Because objdump -d only disassembles the sections expected to hold code.

Line 6a, why is SREG cleared? Isn't it set to what it should be after every instruction?

Most instructions only alter some bits of SREG. Also, this clears the global interrupt enable bit.

Lines 6c and 6e: what do 0xFF and 0x08 correspond to? r28 and r29 are the stack pointer low and high.

The stack pointer is loaded with 0x08ff, which is the last RAM location in the ATmega328P. The stack will grow downwards from there.

I played a bit and added a static global variable. Why do we store in RAM starting from 0x0100 and not 0x0000?

RAM is at 0x0100–0x08ff on the 328P. Below this address you have some memory-mapped registers (the CPU registers and the I/O registers). Check the datasheet for details, section "8.3 SRAM Data Memory".

At line 8a, why ldi r17, 1? We did that before (just a stupid remark). Or can something else alter r17?

Line 8a is useless. It is here because of the way the linker builds the program by gluing together different pieces: __do_copy_data and __do_clear_bss are independent routines, they do not rely on whatever the other left in the registers.

We start copying the program in flash to the RAM, starting at 0xC4 (.bss and other sections I guess), but the cpi/cpc of X with regard to 1 will make ALL the flash copied into all the RAM. Is it just by laziness of the compiler to not stop copying when .bss sections are done copying?

You misunderstood this part of the code. The cpi, cpc and brne instructions will loop only as long as X is different from r17:0x00 (i.e. 0x0100, since r17 = 1). C.f. the pseudo-codes above.