Garbage Collection in Java

https://stackoverflow.com/questions/2942804

05-10-2019
|

Question

On the slides I am revising from it says the following:

Live objects can be identified either by maintaining a count of the number of references to each object, or by tracing chains of references from the roots.

Reference counting is expensive – it needs action every time a reference changes and it doesn’t spot cyclical structures, but it can reclaim space incrementally.

Tracing involves identifying live objects only when you need to reclaim space – moving the cost from general access to the time at which the GC runs, typically only when you are out of memory.

I understand the principles of why reference counting is expensive but do not understand what "doesn’t spot cyclical structures, but it can reclaim space incrementally." means. Could anyone help me out a little bit please?

Thanks

Solution

Reference counting doesn’t spot cyclical structures...

Let's say you have two objects O1 and O2. They reference each other: O1 -> O2 and O2 -> O1, and no other objects references them. They will both have reference count 1 (one referrer).

If neither O1 or O2 is reachable from a GC root, they can be safely garbage collected. This is not detected by counting references though, since they both have reference count > 0.

0 references is a sufficient but not necessary requirement for an object to be eligible for garbage collection.

...but it can reclaim space incrementally.

The incremental part refers to the fact that you can garbage collect some of the 0-referenced objects quickly, get interrupted and continue at another time without problems.

If a tracing-algorithm gets interrupted it will need to start over from scratch the next time it's scheduled. (The tree of references may have changed since it started!)

OTHER TIPS

Simple reference counting cannot resolve cases, when A refers to B and B refers to A. In this case, both A and B will have reference count 1 and will not be collected even if there are no other references.
Reference counting can reclaim space immediately when some object's reference counter becomes 0. There is no need to wait for GC cycle, scan other objects, etc. So, in a sense, it works incrementally as it reclaims space from objects one by one.

"doesn't spot cyclical structures"

Let's say I've got an object A. A needs another object called B to do its work. But B needs another object called C to do its work. But C needs a pointer to A for some reason or other. So the dependency graph looks like:

A -> B -> C -> A

The reference count for an object should be the number of arrows pointing at it. In this example, every object has a reference count of one.

Let's say our main program created a structure like this during its execution, and the main program had a pointer to A, making A's count equal to two. What happens when this structure falls out of scope? A's reference count is decremented to one.

But notice! Now A, B, and C all have reference counts of one even though they're not reachable from the main program. So this is a memory leak. Wikipedia has details on how to solve this problem.

"it can reclaim space incrementally"

Most garbage collectors have a collection period during which they pause execution and free up objects no longer in use. In a mark-and-sweep system, this is the sweep step. The downside is that during the periods between sweeps, memory keeps growing and growing. An object may stop being used almost immediately after its creation, but it will never be reclaimed until the next sweep.

In a reference-counting system, objects are freed as soon as their reference count hits zero. There is no big pause or any big sweep step, and objects no longer in use do not just sit around waiting for collection, they are freed immediately. Hence the collection is incremental in that it incrementally collects any object no longer in use rather than bulk collecting all the objects that have fallen out of use since the last collection.

Of course, this incrementalism may come with its own pitfalls, namely that it might be less expensive to do a bulk GC rather than a lot of little ones, but that is determined by the exact implementation.

Imagine you have three objects (A, B, C): A has a reference on B, B has a reference on C and C has a reference on A. But no other object has any reference on one of these. It's an independent cyclical structure. Using traditional reference counting would prevent the garbage collector from remove the cycle because every object is still referenced. But as long as no one has a reference on one of the three they could/should be removed. I guess reclaiming space incrementally means the way reference counting works when finding unreferenced instances, no cycles etc.

An object can be released for garbage collecting if the reference count reaches 0.

For circular reference this will never happen as each object in the circle keeps a reference to another so they are all at least 1.

For that some graph theory is needed to detect references which are no longer attached to anything, like little islands in the Heap sea. In order to keep these in memory they must have some 'attachment' to a static variable somewhere.

That is what tracing does. It determines which parts of the heap are islands and can be freed and which are still attached to the mainland i.e. static variables smewhere.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow