Does it make sense for an object declared on the heap to declare one of its members on the heap?

https://softwareengineering.stackexchange.com/questions/336270

01-01-2021
|

Question

I recently reviewed some C++ code written by a fellow student, and he did something I wouldn't think is necessary. In main(), he creates a class object on the heap with the new operator. Then he calls a method of the object, which itself calls the new operator to create a large vector.

My gut tells me this is unnecessary--since the class object was allocated to the heap, its member objects should also be on the heap, making the use of new superfluous. Am I right?

Solution

Unless I understood something wrong, no, what he did is correct. The fact that an object is on the heap or the stack has nothing to do to where the reference belongs: you could have, for example, heap objects referencing stack ones and vice versa.

If he did not use the new operator to instantiate his second object on the first one's method, the object would have been cleaned at method return like any other stack object.

Example : you have a B* b field on your object A. In your constructor, you do

{
    B tmpB = B()
    this->b = *tmpB
}

As soon as the constructor leaves, your b field points to memory that is no longer a valid B object.

However, if you have a B b field, that is different. Here b holds a value and no longer a reference. Hence, it will be on the same place A is (stack if A is on stack or heap if A is allocated via new).

OTHER TIPS

A similar question was asked a few weeks ago.

The most important realization is that there are two kinds of "stack vs. heap" that we talk about:

Where the data is actually located. In other words, if you obtain a pointer to one of the array's element, will this pointer belong to one of the heap-allocated memory range, or one of the execution thread's stack memory range.
What is the scope of the object - scope-bound (see: RAII, scope-based resource management, where the object is deleted when the end of scope is reached), or user-managed (the language will not automatically release it; user has to write code somewhere else to release it.)

When we talk about scope-bound, there are two kinds of scopes:

A function call, or a block of code surrounded by curly braces.
- When a function (parent) calls another function (child), everything that is alive in the parent's scope continues to be alive in the child's scope, unless the child function does something to the parent's data.
Instance members of an object.
- The member is alive when the object's constructor finishes successfully, and is no longer alive when the object's destructor is entered.

a large vector

This is what makes this question special. A large vector is handled differently from any other kind of primitives or aggregates, because:

In a lot of programming tasks, the size of the vector cannot be determined at compile time.
The amount of memory (or uncommitted virtual address space) reserved for a thread's call stack (call frame) is limited, whether in 32-bit / 64-bit (although it is seldom a concern in 64-bit).

There are many ways to declare a large vector in C++:

int main(int argc, char** argv)
{
    // very likely located on stack, scope-bound
    int local_on_stack[1000];

    // very likely located on stack, scope-bound
    std::array<int, 1000uL> local_array;

    // allocated on heap, user-managed, need delete[]
    int* local_ptr_to_heap = new int[1000];

    // on heap, scope bound, thanks to std::vector<T>::~vector()
    std::vector<int> local_vec_of_int((size_t)1000, (int)0);

    // on heap, scope bound, thanks to std::unique_ptr<T[]>::~unique_ptr()
    std::unique_ptr<int[]> local_unique_array_of_int((size_t)1000);

    return 0;
}

When the array size is not compile-time constant, arrays tend to be created on the heap. There are compiler-specific extensions to allow creating dynamically-sized arrays on the stack (which is scope-bound by necessity), or alternatively to create a dynamically-sized, scope-bound array on the heap (where the data is located on the heap, but the memory is to be released when the end of scope is reached).

We can make one observation. A compiler can secretly substitute a heap-allocated array for a stack-allocated array, as long as the compiler also secretly inserts code for releasing that array at appropriate exit points of the scope.

However, a compiler cannot substitute in the other direction. This is because an array that is stack-allocated will become invalidated when its creator function exits. The memory range will be reclaimed and reused as the call frame when its parent function makes additional child function calls.

Typically, the following is unnecessary, because (1) there is the extra burden of having to call delete on the std::vector<int>* when it is not needed, and (2) new versions of C++ provides multiple ways of "transferring" the ownership of data from one instance of std::vector to another.

std::vector<int>* pointer_to_vec_of_int = nullptr;
pointer_to_vec_of_int = new std::vector<int>(1000uL);

I thought I'd bring this back up because none of the answers have touched on exactly what your friend has done wrong. I'm obviously guessing exactly what your instincts picked up on, but I'll give you credit for having excellent instincts and being exactly right (Bravo!).

It is a bit pointless allocating a small class on the heap when all that class does is allocate some memory on the heap, but what's really wrong with it is the code duplication. Your friend's class is responsible for allocating the memory and deleting it. By declaring it with new your friend is taking responsibility for deleting the object, and hence for deallocating the memory so in effect you're coding the memory management twice if the object is using a standard library vector then it doesn't need to do it at all (the std::vector is itself a small class that allocates memory on the heap).

In Java and similar languages any object that isn't a built in type has to be created with the new command. A lot of people take this habit over into c++; but, as you've spotted, it's unnecessary. It's also dangerous.

Using the new command directly should be considered "advanced" c++. Don't do it unless you really need to. Whenever you use a new command you are responsible for making sure there is a matching delete command and that that command is always called exactly once. This is difficult! I'm not saying never do it but you should almost never need to and if you do do it you should be aware that you're doing something advanced and you need to be extra careful. There's probably something in the standard library that does it for you.

There are two ways of dealing with this, you can either say "C++ is scary, I'm going back to Java" or you can just stop using the new command when you don't need to.

You may have heard people say that c++ doesn't have garbage collection. In fact, c++ has a very advanced "garbage collection" mechanism. It's just that it's possible to switch it off.

Here's a short program that simulates an early episode of South Park.

#include<string>
#include<iostream>

using namespace std;

class Character{
  string my_name;
public:
  Character(string name) : my_name(name){
    cout << "Hi I'm " << my_name << " and I've just been born" << endl;
  }
  ~Character(){
    cout << "Oh my god, they killed " << my_name << "!" << endl;
  }
};
void south_park(){
  Character kyle("Kyle");
  Character stan("Stan");
  Character cartman("Cartman");

  cout << "Starting Episode" << endl;

  Character ike("Ike");

  {
    Character kenny("Kenny");
    cout << "Some Stuff Happens" << endl;
  }
  cout << "Some more stuff happens" << endl;
  cout << "Kyle learns something today" << endl;
  cout << "End of episode" << endl;

}

int main(int, char**){
  try{
    south_park();
    cout << "What a great episode of south park" << endl;
  }
  catch(...){
    cout << "Aww, we didn't finish the episode" << endl;
  }
}

When you run the program you'll notice that all the characters are born when the Character class is declared. Kyle, Stan and Cartman are all born before the start of the episode, Ike and Kenny are born after the start of the episode, but Kenny is unfortunate enough to be born within a set of curly brackets.

This means that the variable Kenny is only in scope within the curly brackets. If you try to refer to the variable outside of the curly brackets you get an error. In garbage collected languages the object is simply forgotten about until the memory is needed again, but in c++ the object is destroyed as soon as the object goes out of scope. I don't have to do anything to tell you that the character died, the class takes care of that itself in the destructor. Have a play with the code and see what happens to the characters when you initialise them in different places.

Notice that every character that gets born always dies and that characters always die in the reverse order that they were created. This is automatic! Try putting a return statement or a throw somewhere in the episode. It doesn't matter where you put it, some characters might not get born, but every character that gets born will die before the function returns. This is the key to "garbage collection" (which we call resource management) in c++.

If somewhere in the function I declared

Character* butters = new Character("Butters")

then, unless I explicitly call delete butters, Butters will still be alive at the end of the episode: which in c++ is a bad thing.

In c++ when you use a new command you're effectively saying to the compiler "Please turn garbage collection off because I know what I'm doing" but in a lot of cases you're actually saying "I've just turned garbage collection off because I don't know what I'm doing".

When you allocate memory in c++, or open a file, or do anything that needs to be cleaned up afterwards you should always wrap that in a class which does all the clean-up it needs to do in its destructor and that object should always either be on the stack or a member variable of a class. (member variables are destroyed properly when the class is destroyed) If some idiot declares your class with new and doesn't delete it that's his problem not yours.

So in my south park program, if one of my characters needs to allocate memory it can do that knowing full well that the destructor will be called exactly once, so it can safely call delete there as long as I make sure that the class keeps track of any resources it has allocated. In fact it's doing that already without you even noticing it.

The string class is 32 bytes (in clang on 64-bit linux) if the string is short enough to fit within the 32 bytes it will store it there. If the string is longer it will allocate the string on the heap, if you change the length of the string it allocates some new memory, copies the string and deletes the old memory. In the destructor it deletes the memory it has allocated. All this is taken care of without you even knowing that it has happened. As long as your string is on the stack that memory will always be deallocated correctly. As soon as you use a new then you're on your own.

That whole principle is called "Resource acquisition is initialisation" or RAII; which everyone agrees is a terrible name which is why I didn't call it that until the end. There are lots and lots of posts about it on stack overflow plus articles and talks etc if you want to find out more (which you should). Getting this stuff right is the difference between good c++ and bad c++.

Even an object on the heap can't grow once allocated

You can only allocate more space (aggregated objects/nodes/bytes) and link it to the existing object.

The sizeof()any given object is a constant and is defined by the class, where the composition is defined.

As the premise of the question is generic, rather than being specific to a snippet (& how to optimise it), there's nothing wrong with your peer's implementation approach.

Consider any of the STL containers (vectors, lists, sets, maps & deques). They all internally require to operate using dynamic memory i.e. on heap. So if I were to create an instance of any of these on heap, it would be the same as your question. However, it would definitely be correct to do so if the requirement dictates the container to be on heap.

For instance, let's consider a class MailBox, which internally has a std::list of MailFolder objects, and each MailFolder will have a std::list of Mail objects.

    class Mail
    {
    ...
    };

    class MailFolder
    {
    private:
        std::list<Mail> mReceivedMail;
    ...
    };

    class MailBox
    {
    private:
        std::list<MailFolder> mFolders;
    ...
    };

    Mailbox mb;
    //'mb' object is on the stack and not the heap.

Now in this case, we seem to be not putting anything on the heap .... or are we?

The std::list uses dynamic memory, there by placing the contents of MailBox::mFolders on the heap. So all the MailFolder objects are now on the heap. Then the contents of MailFolder::mReceivedMail are again placed on the heap separately, outside the memory space of the MailFolder objects. What necessitates this design is that we can neither predetermine the number of folders a user will create in the mailbox nor the number of mails received in each of the folder.

In summary, this is a perfectly valid scenario, to allocate heap-memory from inside an object that's already on the heap-memory itself.

Hope this more clarifies the doubt than cloud the understanding. :-)

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange