Why can it be dangerous to use this POD struct as a base class?

https://stackoverflow.com/questions/7113422

29-12-2020
|

Вопрос

I had this conversation with a colleague, and it turned out to be interesting. Say we have the following POD class

struct A { 
  void clear() { memset(this, 0, sizeof(A)); } 

  int age; 
  char type; 
};

clear is intended to clear all members, setting to 0 (byte wise). What could go wrong if we use A as a base class? There's a subtle source for bugs here.

Решение

The compiler is likely to add padding bytes to A. So sizeof(A) extends beyond char type (until the end of the padding). However in case of inheritance the compiler might not add the padded bytes. So the call to memset will overwrite part of the subclass.

Другие советы

In addition to the other notes, sizeof is a compile-time operator, so clear() will not zero out any members added by derived classes (except as noted due to padding weirdness).

There's nothing really "subtle" about this; memset is a horrible thing to be using in C++. In the rare cases where you really can just fill memory with zeros and expect sane behaviour, and you really need to fill the memory with zeros, and zero-initializing everything via the initializer list the civilized way is somehow unacceptable, use std::fill instead.

In theory, the compiler can lay out base classes differently. C++03 §10 paragraph 5 says:

A base class subobject might have a layout (3.7) different from the layout of a most derived object of the same type.

As StackedCrooked mentioned, this might happen by the compiler adding padding to the end of the base class A when it exists as its own object, but the compiler might not add that padding when it's a base class. This would cause A::clear() to overwrite the first few bytes of the members of the subclass.

However in practice, I have not been able to get this to happen with either GCC or Visual Studio 2008. Using this test:

struct A
{
  void clear() { memset(this, 0, sizeof(A)); }

  int age;
  char type;
};

struct B : public A
{
  char x;
};

int main(void)
{
  B b;
  printf("%d %d %d\n", sizeof(A), sizeof(B), ((char*)&b.x - (char*)&b));
  b.x = 3;
  b.clear();
  printf("%d\n", b.x);

  return 0;
}

And modifying A, B, or both to be 'packed' (with #pragma pack in VS and __attribute__((packed)) in GCC), I couldn't get b.x to be overwritten in any case. Optimizations were enabled. The 3 values printed for the sizes/offsets were always 8/12/8, 8/9/8, or 5/6/5.

The clear method of the base class will only set the values of the class members.

According to alignment rules, the compiler is allowed to insert padding so that the next data member will occur on the aligned boundary. Thus there will be padding after the type data member. The first data member of the descendant will occupy this slot and be free from the effects of memset, since the sizeof the base class does not include the size of the descendant. Size of parent != size of child (unless child has no data members). See slicing.

Packing of structures is not a part of the language standard. Hopefully, with a good compiler, the size of a packed structure does not include any extra bytes after the last. Even so, a packed descendant inheriting from a packed parent should produce the same result: parent sets only the data members in the parent.

Briefly: It seems to me that the only one potentional problem is in that I can not found any info about "padding bytes" guarantees in C89, C2003 standarts....Do they have some extraordinary volatile or readonly behavior - I can not find even what does term "padding bytes" mean by the standarts...

Detailed:

For objects of POD types it is guaranteed by the C++2003 standard that:

when you memcpy the contents of your object into an array of char or unsigned char, and then memcpy the contents back into your object, the object will hold its original value
guaranteed that there will be no padding in the beginning of a POD object
can break C++ rules about: goto statement, lifetime

For C89 there is also exist some guarantees about structures:

When used for a mixture of union structures if structs have same begining, then first compoments have perfect mathing
sizeof structures in C is equal to the amount of memory to store all the components, the place under the padding between the components, place padding under the following structures
In C components of the structure are given addresses. There is a guarantee that the components of the address are in ascending order. And the address of the first component coincides with the start address of the structure. Regardless of which endian the computer where the program runs

So It seems to me that such rules is appropriate to C++ also, and all is fine. I really think that in hardware level nobody will restrict you from write in padding bytes for non-const object.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow