Basic Blocks and Control Flow in x86

https://stackoverflow.com/questions/21960707

15-10-2022
|

Domanda

The Red-Dragon book contains, page 529, algorithm 9.1, and algorithm for partitioning assembly code into basic blocks. The algorithm is essentially:

Input: Assembly, as a list of instructions.
Output: A list of basic blocks of instructions

Phase 1: Mark the leaders. The very first statement (of a procedure) is a leader. Any statement that is the target of a branch is a leader. Any statement that immediately follows a branch is a leader.

Phase 2: Assemble the blocks. To make a basic block, add a leader and all statements to the block until another leader is encountered. Do this until no statements are left.

End and return the blocks.

Then, later, they use this block structure to develop control flow analyses. The basic structure developed is a control flow graph; it is made by treating basic blocks as nodes in a graph and creating a directed edge from Block Bi to Bj if Bi ends in a jump to Bj.

Cooper, Harvey and Waterman (in Building a Control Flow Graph from Scheduled Code) point out that this algorithm for creating a control flow graph is insufficient if there is a jump to a location held in a memory address or register.

There are several questions this brings up. When can assembly contain a branch to a location in memory? What is scheduled code? Are there any other issues one should be aware of when building a control flow graph from x86? What are the best known algorithms/implementations for building control flow graphs directly from x86?

Soluzione

There are several cases where a branch is performed on register's content (or variable content, for that matter)

Jump table for switches, where the address to jump to is calculated using some table of addresses/offsets and the value of a register.
A function pointer, such as callback function, where the function to be called is given as a parameter to the caller, or stored in some variable.
virtual function tables in C++, where the function to be called is not known at compile time, and may change in runtime.

Here is an example for function pointer call (64 bit Mac OS):

C code

int fptr_call( int (*ptr)(int) ) {
    return (*ptr)( 3 ); 
}

Assembly

_fptr_call:
0000000100000e70        pushq   %rbp
0000000100000e71        movq    %rsp, %rbp
0000000100000e74        subq    $0x10, %rsp
0000000100000e78        movl    $0x3, %eax
0000000100000e7d        movq    %rdi, 0xfffffffffffffff8(%rbp)
0000000100000e81        movl    %eax, %edi
; Call based on %rbp, copied from %rdi which is ptr
0000000100000e83        callq   *0xfffffffffffffff8(%rbp) 
0000000100000e86        addq    $0x10, %rsp
0000000100000e8a        popq    %rbp
0000000100000e8b        ret
0000000100000e8c        nopl    (%rax)

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow