I've been delving deeper into Linux and C, and I'm curious how functions are stored in memory.
I have the following function:
void test(){
printf( "test\n" );
}
Simple enough. When I run objdump on the executable that has this function, I get the following:
08048464 <test>:
8048464: 55 push %ebp
8048465: 89 e5 mov %esp,%ebp
8048467: 83 ec 18 sub $0x18,%esp
804846a: b8 20 86 04 08 mov $0x8048620,%eax
804846f: 89 04 24 mov %eax,(%esp)
8048472: e8 11 ff ff ff call 8048388 <printf@plt>
8048477: c9 leave
8048478: c3 ret
Which all looks right.
The interesting part is when I run the following piece of code:
int main( void ) {
char data[20];
int i;
memset( data, 0, sizeof( data ) );
memcpy( data, test, 20 * sizeof( char ) );
for( i = 0; i < 20; ++i ) {
printf( "%x\n", data[i] );
}
return 0;
}
I get the following (which is incorrect):
55
ffffff89
ffffffe5
ffffff83
ffffffec
18
ffffffc7
4
24
10
ffffff86
4
8
ffffffe8
22
ffffffff
ffffffff
ffffffff
ffffffc9
ffffffc3
If I opt the leave out the memset( data, 0, sizeof( data ) ); line, then the right-most byte is correct, but some of them still have the leading 1s.
Does anyone have any explanation for why
A) using memset to clear my array results in an incorrect (edit: inaccurate) representation of the function, and
SOLUTION: was due to using memset( data, 0, sizeof( data ) ), rather than memset( data, 0, 20 * sizeof( unsigned char ) ). The memory wasn't being fully set because it was looking only at the size of a pointer than the size of the entire array.
B) what is this byte stored as in memory? ints? char? I dont quite understand what's going on here. (clarification: what type of pointer would I use to traverse such data in memory?)
SOLUTION: I'm dumb. I forgot the unsigned keyword, and that is where the entire problem came from :(
Any help would be greatly appreciated - I couldn't find anything when searching around for this.
Neil
PS: My immediate thought is that this is a result of x86 having an instructions that don't end on a byte or half-byte boundary. But that doesn't make a whole lot of sense, and shouldn't cause any problems.
Thank you to Will for pointing out my error with the char type. It should have been unsigned char. I'm still curious as to how to access individual bytes however.