Managed code: could GC be taken care of during compile time?

https://softwareengineering.stackexchange.com/questions/339666

05-01-2021
|

Question

Garbage collection happens during runtime, real time, while managed code is running. In C++ however, we need to write destruct statements into the code. So we could say that GC is built into the code (by us). So why can't managed languages behave like this? The compiler would analyse the code and insert the destruct methods into the appropriate places in the object code. If the answer to this is that eligibility for GC for some objects become clear only during running, it might still make sense to build destruct statements into the compiled code in some cases, no? Maybe in the majority of cases?

Solution

The compiler would analyse the code and insert the destruct methods into the appropriate places in the object code.

The answer to this is the same as the answer to the second paragraph of this question, which makes a very similar statement: C++ delete vs Java GC.

Figuring out the lifetime of objects is equivalent to solving the Halting Problem, so, no, the compiler can not insert the destruct methods into the appropriate places in the object code. This is especially true for languages with closures.

But wait, C++ has closures and no GC, how does that work? Well, it doesn't: you can crash your program (or more precisely: run into undefined behavior) using closures, exactly because the compiler cannot figure out the lifetimes of your closed-over variables. Aliasing is another thing that makes such memory analysis really hard.

In general, unless your language is specifically designed (and specifically restricted) for such analysis, it is impossible.

Some JVMs perform Escape Analysis at compile-time to figure out whether a reference will escape the local scope or not, and if it doesn't, the object can be allocated on the stack instead of the heap. But, you guessed it, EA is equivalent to solving the Halting Problem.

The Azul JVM does Escape Detection: it allocates objects on the stack, and when it sees a reference escaping the local scope, it will re-allocate on the heap and patch up all existing references. This happens at runtime, and thus is not subject to the Halting Problem … but you were talking about doing it at compile-time, so this doesn't count.

OTHER TIPS

In C++ however, we need to write destruct statements into the code.

That is not correct. In C++, scope-bound object creation releases the programmer from that burden. The cases where one needs to use explicit "new" and "delete" statements are mostly the cases where the lifetime of the object can only determined at run time (so it cannot be figured out by the compiler where the correct "delete" statement needs to be place or when it has to be called).

So why can't managed languages behave like this?

So putting your wrong assumption aside and interpret this part of the question as "why can't managed languages provide also a scope-bound memory management like C++" - then the answer is, they can: C++/CLI does provide such a mechanism, even for "managed" objects. And if scope-bound management does not fit, the situation becomes the same as the one above - in most cases, the information when the object has to be destructed is simply not available at compile time.

Well, I changed my statement above from "always" to "mostly", since after rethinking your question, I guess you had something like this in mind:

an object is allocated by "new" somewhere at the beginning of the local scope
the object is used only within that scope, and nowhere else
when the scope is left, the object could immediately be destructed and deallocated, without the need to pass it over to the GC to free it later

And your question is "can the compiler determine the fact the object does not leave the local scope in certain cases, and so generate an automatic destruct call at the end of the scope"? If that is your question, I think @JörgWMittag's answer is what you are looking for.

Why do memory leaks exist in the first place in languages with no garbage collection?

Because it's difficult for developers to find exactly when to deallocate specific objects (or free the previously allocated memory). Even if you take a short-living variable which has an easily detectable lifespan:

Declare and assign a value
Use the variable
Destroy the variable

the code may misbehave, for instance by throwing an exception at the second line, leading to the variable not being destroyed.

It's similar for an imaginary GC-aware compiler. While some patterns could be processed by an app to identify, during the build, when a specific variable should be destroyed, handling any but the very elementary cases would require too much time for static checking.

Since you seem to use .NET, you have probably tried Code Contracts and the corresponding compiler. How long the build takes? For a Hello World, a few seconds. For a small app, several minutes. For medium scale app, hours or days or weeks. It's not that the technology itself is useless: it is very interesting and brings its benefits. But waiting for hours every time you need to recompile the app? Not a good idea.

The problem of releasing memory (or objects) is to know when exactly they can be released. An object can be referenced from multiple places at the same time. The object or its memory can be released when nobody references it anymore.

It is easy for a compiler to determine that one particular reference to an object goes away. However, one reference going away doesn't mean the object can be released, because there may be any number of other references to the object. So it's quite impossible to determine at compile time that an object can be released, at least for most objects.

There is no benefit to your approach. GC is performed by the framework at convenient times or when it cannot be postponed. You suggest to have it done at pre-determined moments. Those are likely to be more inconvenient moments.

Performance-wise the runtime option will usually be the better one. If you need the control for some real-time or small embedded system, you may want to or need to cleanup after yourself and use a traditional compiled language. But the in-between option is pointless, given what is available today.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange