Wednesday, June 9, 2010

Why conservative garbage collection should be avoided

Sometimes one still sees around conservative garbage collection. I do not know how common that is, but Mono does use it, as did many JS implementation in the pre-V8 era (according to V8 website). And that's really a pity!
A quick intro first: conservative GC means, by definition, that since you don't know which words are pointers and which are just random other stuff, you are conservative and just assume they might all be pointers.
There are two reasons why conservative GC is bad:
  • First, and most obvious: you might have false positives, so you keep in memory more objects than you should. I'm not really sure how common this is, and how relevant. But it's not the really bad thing. Two other problems come into play!
  • You may not write a compacting collector, i.e. one that moves objects to be consecutive in memory at garbage collection time. Such a collector allows memory allocation to be much faster than with malloc()/free(): if all allocated objects form a sequence, also free memory does. Then you know where you can get free memory.
    In contrast, malloc()/free() have to manage free lists of some kinds, which list the available blocks of free memory, and perform a search through them for a block of the appropriate size; the actual details of this vary, depending on the used algorithm, but it's anyway slower. And the fact that memory allocation is much faster with GC is one of the coolest things about it. Depending on the scenario, a program might even be faster with GC, and there are papers about that.
  • Another problem is that if you don't compact memory, you get memory fragmentation: free memory is fragmented in more small blocks that may be of little use when allocating a bigger object, and still may not be used by any other program on the system. It's theoretically possible that a substantial fraction of the used memory of your app is just wasted by fragmentation, even if it's not so easy for me to find a realistic scenario where this happens.

2 comments:

  1. These criticisms are true for naive conservative GC's. But real world CGC's have solutions for these problems.

    1) Blacklisting and other techniques reduce the issue of false positives to irrelevance. In most real world programs it's statistically more likely that a cosmic rays will flip bits than a significant amount of memory will leak due to a false positive.

    2) Using a "small block" allocator deals with the issue of allocation speed. A small block allocator divides a big block into many fixed sized blocks. All small blocks are the same size, so it's a couple of pointer swaps and bit flips to allocate or reclaim a block. Not as fast as a compacting collector, but only negligibly slower.

    3) Small block allocator deals with both internal and external fragmentation.

    ReplyDelete
  2. I have not studied papers about the internals of Boehm's GC (which I expect should incorporate all solutions you mention). Still, I just found a real report of memory leaks:
    http://flyingfrogblog.blogspot.com/2009/01/mono-22-still-leaks-memory.html
    I have read the comments to that post (not the email discussion). First, they point to a paper which shows brokenness of conservative GCs: http://www.hpl.hp.com/personal/Hans_Boehm/gc/bounds.html
    They provide an example that breaks a conservative GC, and explain how to fix it. It is fine for them, but for a .NET VM, you can't require valid C# code to be fixed, you have to support it.

    What annoyed me the most, in that discussion, was this comment from a Mono developer:
    "Interoping with C or C++ requires at least a partially conservative collector. It *cannot* be done with a precise garbage collector."
    This is ignorance (the JVM and .NET disprove it). OK, you have to require C code to cooperate with the GC, but you should.

    It was also interesting to see this older issue, even if it has been fixed, and the ignorance of the commentor from the Mono front about tail call elimination was annoying.
    However, at least this seems to have been fixed.
    http://flyingfrogblog.blogspot.com/2009/01/mono-does-not-support-tail-calls.html

    ReplyDelete