What's going on in Apple LLVM-gcc x86 assembly?

Question 1

Since the question is really about those odd labels and data and not really about the code itself, I'm only going to shed some light on them.

If an instruction of the program causes an execution error (such as division by 0 or access to an inaccessible memory region or an attempt to execute a privileged instruction), it results in an exception (not a C++ kind of exception, rather an interrupt kind of it) and forces the CPU to execute the appropriate exception handler in the OS kernel. If we were to totally disallow these exceptions, the story would be very short, the OS would simply terminate the program.

However, there are advantages of letting programs handle their own exceptions and so the primary exception handler in the OS handler reflects some of exceptions back into the program for handling. For example, a program could attempt to recover from the exception or it could save a meaningful crash report before terminating.

In either case, it is useful to know the following:

the function, where the exception has occurred, not just the offending instruction in it
the function that called that function, the function that called that one and so on

and possibly (mainly for debugging):

the line of the source code file, from which this instruction was generated
the lines where these function calls were made
the function parameters

Why do we need to know the call tree?

Well, if the program registers its own exception handlers, it usually does it something like the C++ try and catch blocks:

fxn()
{
  try
  {
    // do something potentially harmful
  }
  catch()
  {
    // catch and handle attempts to do something harmful
  }
  catch()
  {
    // catch and handle attempts to do something harmful
  }
}

If neither of those catches catches, the exception propagates to the caller of fxn and potentially to the caller of the caller of fxn, until there's a catch that catches the exception or until the default exception handler that simply terminates the program.

So, you need to know the code regions that each try covers and you need to know how to get to the next closest try (in the caller of fxn, for example) if the immediate try/catch doesn't catch the exception and it has to bubble up.

The ranges for try and locations of catch blocks are easy to encode in a special section of the executable and they are easy to work with (just do a binary search for the offending instruction addresses in those ranges). But figuring out the next outer try block is harder because you may need to find out the return address from the function, where the exception occurred.

And you may not always rely on rbp+8 pointing to the return address on the stack, because the compiler may optimize the code in such a way that rbp is no longer involved in accessing function parameters and local variables. You can access them through rsp+something as well and save a register and a few instructions, but given the fact that different functions allocate different number of bytes on the stack for the locals and the parameters passed to other functions and adjust rsp differently, just the value of rsp isn't enough to find out the return address and the calling function. rsp can be an arbitrary number of bytes away from where the return address is on the stack.

For such scenarios the compiler includes additional information about functions and their stack usage in a dedicated section of the executable. The exception-handling code examines this information and properly unwinds the stack when exceptions have to propagate to the calling functions and their try/catch blocks.

So, the data following _main.eh contains that additional information. Note that it explicitly encodes the beginning and the size of main() by referring to Leh_func_begin1 and Leh_func_end1-Leh_func_begin1. This piece of info allows the exception-handling code to identify main()'s instructions as main()'s.

It also appears that main() isn't very unique and some of its stack/exception info is the same as in other functions and it makes sense to share it between them. And so there's a reference to Leh_frame_common.

I can't comment further on the structure of _main.eh and the exact meaning of those constants like 144 and 13 as I don't know the format of this data. But generally one doesn't need to know these details unless they are the compiler or the debugger developers.

I hope this give you an idea of what those labels and constants are for.

Question 2

Ok lets give it a try

// First section of code, declaring the main function that has to be align on a 32 bit boundary.

UPDATE: My explanation of the .align directive may be wrong. See gas documentation below.

.section    __TEXT,__text,regular,pure_instructions
.globl  _main
.align  4, 0x90
_main:

Store the previous base pointer and allocate stack space for local variables.

Leh_func_begin1:
pushq   %rbp
Ltmp0:
movq    %rsp, %rbp
Ltmp1:
subq    $32, %rsp
Ltmp2:

Push the arguments on the stack and call puts()

movl    %edi, %eax
movl    %eax, -4(%rbp)
movq    %rsi, -16(%rbp)
leaq    L_.str(%rip), %rax
movq    %rax, %rdi
callq   _puts

Put return value on stack, free local memory, restore base pointer and return.

movl    $0, -24(%rbp)
movl    -24(%rbp), %eax
movl    %eax, -20(%rbp)
movl    -20(%rbp), %eax
addq    $32, %rsp
popq    %rbp
ret
Leh_func_end1:

Next section, also a code section, containing the string to print.

.section    __TEXT,__cstring,cstring_literals
L_.str:
.asciz   "Hello, World!"

The rest is unknown to me, could be data used be the c startup code and or debugging info.

.section    __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support
...

UPDATE: Documentation on the .align directive from: http://sourceware.org/binutils/docs-2.23.1/as/Align.html#Align

"The way the required alignment is specified varies from system to system. For the arc, hppa, i386 using ELF, i860, iq2000, m68k, or32, s390, sparc, tic4x, tic80 and xtensa, the first expression is the alignment request in bytes. For example `.align 8' advances the location counter until it is a multiple of 8. If the location counter is already a multiple of 8, no change is needed. For the tic54x, the first expression is the alignment request in words.

For other systems, including ppc, i386 using a.out format, arm and strongarm, it is the number of low-order zero bits the location counter must have after advancement. For example `.align 3' advances the location counter until it a multiple of 8. If the location counter is already a multiple of 8, no change is needed.

This inconsistency is due to the different behaviors of the various native assemblers for these systems which GAS must emulate. GAS also provides .balign and .p2align directives, described later, which have a consistent behavior across all architectures (but are specific to GAS)."

//jk

Question 3

You can find the answers for pretty much any questions you've got related to the directives here and here.

For example:

.section    __TEXT,__text,regular,pure_instructions

Declares a section named __TEXT,__text with the default section type and specify that this section will contain only machine code (i.e. no data).

.globl _main
Makes the _main label (symbol) global, so that it will be visible to the linker.

.align 4, 0x90
Aligns the location counter to the next 2^4 (==16) byte boundary. The space in between will be filled with the value 0x90 (==NOP).

As for the code itself, it's obviously doing a lot of redundant intermediary loads and stores. Try compiling with optimizations enabled as one of the commentators suggested and you should find that the resulting code will make more sense.