Question

I have a PE file and I try to disassemble it in order to get it's instructions. However I noticed that .text segment contains not only instructions but also some data (I used IDA to notice that). Here's the example:

.text:004037E4                 jmp     ds:__CxxFrameHandler3
.text:004037EA ; [00000006 BYTES: COLLAPSED FUNCTION _CxxThrowException. PRESS KEYPAD "+" TO EXPAND]
.text:004037F0 ;             
.text:004037F0                 mov     ecx, [ebp-10h]
.text:004037F3                 jmp     ds:??1exception@std@@UAE@XZ ; std::exception::~exception(void)
.text:004037F3 ; 
.text:004037F9 byte_4037F9     db 8Bh, 54h, 24h        ; DATA XREF: sub_401440+2o
.text:004037FC                 dd 0F4428D08h, 33F04A8Bh, 0F6B2E8C8h, 0C4B8FFFFh, 0E9004047h
.text:004037FC                 dd 0FFFFFFD0h, 3 dup(0CCCCCCCCh), 0E904458Bh, 0FFFFD9B8h
.text:00403828 dword_403828    dd 824548Bh, 8BFC428Dh, 0C833F84Ah, 0FFF683E8h, 47F0B8FFh
.text:00403828                                         ; DATA XREF: sub_4010D0+2o
.text:00403828                                         ; .text:00401162o
.text:00403828                 dd 0A1E90040h, 0CCFFFFFFh, 3 dup(0CCCCCCCCh), 50E0458Dh
.text:00403828                 dd 0FFD907E8h, 458DC3FFh, 0D97EE9E0h
.text:00403860                 db 2 dup(0FFh)
.text:00403862 word_403862     dw 548Bh

How can I distinct such data from instructions? My solution to this problem was to find simply the first instruction (enter address) and visit each instruction and all called functions. Unfortunatelly it occured that there are some blocks of code which are not directly called but their addresses are in .rdata segment among some data and I have no idea how distinct valid instruction addresses from data.

To sum up: is there any way to decide whether some address in .text segment contains data or instructions? Or maybe is there any way to decide which potential addresses in .rdata should be interpreted as instructions addresses and which as data?

Was it helpful?

Solution

You cannot, in general. The .text section of a PE file can mix up code and constants any way the author likes. Programs like IDA try to make sense of this by starting with the entrypoints and then disassembling, and seeing which addresses are targets of jumps, and which of reads. But devious programs can 'pun' between instructions and data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top