What are the advantages and disadvantages of having mark bits together and separate for Garbage Collection

StackOverflow https://stackoverflow.com//questions/23057531

Question

I was watching video Google IO 2008 - Dalvik Virtual Machine Internals to understand how Dalvik VM works and why those people has preferred Dalvik VM over JVM for android. I found that android uses separate memory for Garbage information about the objects , opposed to the JVM where we have mark bits(bits telling whether object is able for garbagfe collection or not) together with objects.

Can anybody tell me in detail what are the advantages and disadvantages of having separate memory for marks bits and not having separate memory for mark bits ?

I was unable to get this difference by watching video.

Was it helpful?

Solution

Some advantages of a separate bitmap:

  • Much denser. A typical GC needs maybe eight bits of GC metadata, but due to alignment an in-object header might round this memory up to 32 bits.
  • Some operations, in particular around sweeping, become faster. This is partly because the denser (see above) bitmap means less memory traffic and better cache use, but also because some operations (e.g. zeroing all mark bits) can be vectorized when in this format. (Other parts of the GC needs to be designed to make use of that ability.)
  • If you fork() on a Unix system, a separate bitmark makes better use of copy-on-write: Pages containing objects might remain shared.

Some advantages of in-object mark bits:

  • Depending on the scheme used to associate objects with bitmaps, getting the mark bit for an object and vice versa can be quite complicated and/or slow. An in-object header, on the other hand, is trivial to access.
  • Easier memory management: No need to create a separate allocation of the right size and keep it in sync.
  • Many fast schemes for finding bitmaps for objects and vice versa are quite restrictive in other regards. For example, if you create a bitmap for every page and store the bitmap pointer at the start of the page, you have a problem storing objects larger than a page.

OTHER TIPS

Separate mark bits work by having an array of bits where each bit represents an address in the heap that can start an object. For example, suppose the heap is 65536 bytes and all objects are aligned at 16 byte boundaries, then there are 4096 addresses in the heap that can be the start of an object. This means the array needs to contain 4096 bits, which can be efficiently stored as 512 bytes or 64 64bit sized unsigned integers.

In-object mark bits works by having one bit of each header of each object be set to 1 if the object is marked and 0 otherwise. Note that this requires each object to have a dedicated header area. Runtimes such as the JVM and .NET all add headers to objects so you essentially get the space for the mark bit for free.

But it doesn't work for conservative collectors which don't have full control of the environment they are running in, such as the Boehm GC. They can misidentify integers as pointers, so for them modifying anything in the mutators data heap is risky.

Mark & sweep garbage collection is divided into two phases: marking and sweeping. Marking using in-object mark bits is straight-forward (pseudo-code):

if not obj.is_marked():
    obj.mark()
    mark_stack.append(obj)

Using a separate array for storing mark bits, we have to convert the objects address and size to indices in the bit array and set the corresponding bits to 1:

obj_bits = obj.size_in_bytes() / 16
bit_idx = (obj - heap.start_address()) / 16
if not bitarr.bit_set(bit_idx):
    bitarr.set_range(bit_idx, obj_bits)
    mark_stack.append(obj)

So in our example, if an object is 128 bytes long, 8 bits will be set in the bit array. Clearly, using in-object mark bits is much simpler.

But separate mark bits gain some momentum when sweeping. Sweeping involves scanning through the whole heap and finding continuous regions of memory which is unmarked and therefore can be reclaimed. Using in-object mark bits, it would roughly look like this:

iter = heap.start_address()
while iter < heap.end_address():
    # Scan til the next unmarked object
    while iter.is_marked():
        iter.unmark()
        iter += iter.size()
        if iter == heap.end_address():
            return
    # At an unmarked block
    start = iter
    # Scan til the next marked object
    while iter < heap.end_address() and not iter.is_marked():
        iter += iter.size()
    size = iter - start
    # Reclaim the block
    heap.reclaim(start, size)

Note how the iteration jumps from object to object in the iter += iter.size() lines. This means that the sweep phase running time is proportional to the total number of live and garbage objects.

Using separate mark bits, you would do roughly the same loop except that large swathes of garbage objects would be flown over without "stopping" on each of them.

Consider the 65536 heap again. Suppose it contains 4096 objects that are all garbage. Iterating the 64 64bit integers in the mark bits array and seeing that they are all 0 is obviously very fast. Therefore the sweeping phase can potentially be much faster with separate mark bits.

But there is another wrinkle! In any mark and sweep collector, the running time is dominated by the mark phase and not the sweep phase which is usually very quick. So the verdict is still out. Some prefer separate mark bits, other prefer in-object ones. To the best of my knowledge, no one has yet been able to show which approach is superior to the other.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top