Question

I have an array of 9 bytes and I want to copy these bytes to a structure :

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

typedef struct _structure {
    char one[5];        /* 5 bytes */
    unsigned int two;   /* 4 bytes */
} structure;

int main(int argc, char **argv) {

    structure my_structure;

    char array[]    = {
        0x41, 0x42, 0x43, 0x44, 0x00,   /* ABCD\0 */
        0x00, 0xbc, 0x61, 0x4e          /* 12345678 (base 10) */
    };

    memcpy(&my_structure, array, sizeof(my_structure));

    printf("%s\n", my_structure.one);   /* OK, "ABCD" */
    printf("%d\n", my_structure.two);   /* it prints 1128415566 */

    return(0);
}

The first element of the structure my_structure, one, is copied correctly; however, my_structure.two contains 1128415566 while I expect 12345678. array and my_structure have different sizes and even if they are equal in size, still there will be a problem with two . How can I fix this issue?

Was it helpful?

Solution

As Mysticial already explained, what you're seeing is the effect of structure alignment - the compiler will align elements on boundaries of its word size, ie in 32 bits code on 4-byte boundaries, effectively leaving a gap of 3 bytes between the char[5] and the next element.

If you use gcc or Visual Studio, #pragma pack(1) allows you to override the "preferred" packing the compiler would use by default - in this example you instruct the compiler to instruct on 1-byte boundaries, ie without "holes". This is often useful in embedded systems to map blocks of bytes onto a structure. For other compilers consult your compiler manual.

OTHER TIPS

There are a few problems:

For efficiency reasons, compilers align variables on boundaries equal to the register size of of the processor. I.e. on 32-bit systems this would be on 32-bit (4 byte) boundaries. Additionally, structures will have "gaps" so that the struct members can be aligned on 32-bit boundaries. In other words: the struct is not "packed" tightly. Try this:

#include <stdio.h>

typedef struct
{
    char one[5];        /* 5 bytes */
    unsigned int two;   /* 4 bytes */
}
    structure;
structure my_structure;

char array[] = 
{
    0x41, 0x42, 0x43, 0x44, 0x00,   /* ABCD\0 */
    0x00, 0xbc, 0x61, 0x4e          /* 12345678 (base 10) */
};

int main(int argc, char **argv) 
{
    const int sizeStruct = sizeof(structure);
    printf("sizeof(structure) = %d bytes\n", sizeStruct);
    const int sizeArray = sizeof(array);
    printf("sizeof(array) = %d bytes\n", sizeArray);
    return 0;
}

You should see different sizes.

You can override this behavior by using #pragma or attribute directives. With gcc you can change the structure definition using attributes. E.g. change above code to add a "packed" attribute (requires gcc):

typedef struct __attribute__((packed))

Then run the program again. Sizes should be the same now. Note: On some processor architectures, e.g. ARMv4, 32-bit variables must be aligned on a 32-bit boudary or your program will not run (get an exception). Read compiler documentation of "aligned" and "packed" pragmas or attributes.

The next problem is byte order. Try this:

printf("0x%08X\n", 12345678);

12345678 in hex is 0x00BC614E. From your example and the output you are getting, I can tel that you platform is "little endian". In "little endian" systems, the number 0x00BC614E is stored as a byte sequence starting with the least significant byte, e.g. 0x4E, 0x61, 0xBC, 0x00. So change your array definition:

char array[] = 
{
    0x41, 0x42, 0x43, 0x44, 0x00,   /* ABCD\0 */
    0x4E, 0x61, 0xBC, 0x00,         /* 12345678 (base 10) */
};

Now your program will print 12345678.

Also note that you should use %u to print an unsigned int.

Copying char strings is potentially a can of worms, especially if you have to allow for different encodings (e.g. Unicode). At the very least, you need ensure that your copy destination buffer is protected from overruns.

Revised code:

#include <stdio.h>
#include <string.h>

typedef struct
{
    char one[5];        /* 5 bytes */
    unsigned int two;   /* 4 bytes */
}
    structure;

structure my_structure;

char array[] = 
{
    0x41, 0x42, 0x43, 0x44, 0x00,   /* ABCD\0 */
    0x4E, 0x61, 0xBC, 0x00,         /* 12345678 (base 10) */
};

int main() 
{
    // copy string as a byte array
    memcpy(&my_structure.one, &array[0], sizeof(my_structure.one));

    // copy uint
    my_structure.two = *((unsigned int *)(&array[5]));

    printf("%s\n", my_structure.one);
    printf("%u\n", my_structure.two);

    return 0;
}

Finally, it is usually a bad idea to rely on packed data structures because it makes porting code to a different platform difficult. However, sometimes you need to pack/unpack protocol packets. In those special cases it is usually best and most portable to manually pack / unpack each item using a pair of functions for each data type.

I will leave endian-ness issues for another topic. :-)

As your other answers have already indicated, you are seeing an alignment issue. Compilers tend to align data structures along long or quadword boundaries according to the kind of processor you have. That means if what you have declared in your structure does not align, then the compiler packs in alignment bytes, and you are not supposed to see them.

By the way, once upon a time, the whole world wasn't Intel; there were other processors each with its own unique alignment requirements, so alignment was something we all dealt with quite a bit, especially porting boot ROM code across different processor families.

When running into problems like this, I suggest altering your code to conduct a little experiment, like the following:

1) Add a declaration structure * pStructure; to your code.

2) Add pStructure = (structure *) array; ` right after array's declaration.

3) Then, at the line where the memcpy is, set a breakpoint.

When you hit the breakpoint, enter the print or display command (gdb uses p)

p pStructure->one
(gdb) p pStructure->one
$4 = "ABCD"

and then the following

(gdb) p pStructure->two
$7 = 3486515278

As to the 4-byte number, I believe you are not seeing the number you you expect, because you are representing an ASCII number an array of bytes rather than .two's type which is unsigned int.

Aside from the number of value, If you used a structure pointer to access data in the array, I believe that will access the data correctly, because there is nothing to pad in the middle of an array of bytes. Therefore, your data is sequential, and your fields line up. There is no alignment issue.

memcpy is just copying bytes, and does not interpret the fields of your struct or what the compiler may have done to align your struct.

Doing things like this was the only way I could appreciate pointers, especially working in assembly language.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top