Wednesday, 8 October 2008

Garbage collection flavours

In my previous post, I described the basic garbage collection algorithms. Within these basic categories, there are many variants, or flavours.

Generational collectors

Generational collectors exploit the observed properties that most objects tend to die young, and that young objects are more likely to reference old objects than the reverse. Together these are known as the weak generational hypothesis.

Generational collectors divide the heap up into multiple generations. Young generations can be collected without collecting old generations. These partial collections are quicker than full heap collections, and are likely to produce a good return of free space relative to the area collected, since most objects die young. At least the younger generations tend to employ copying collectors, since these collectors are very efficient in heaps with high attrition rates.

Objects which survive a given number of collections are tenured and move up to an older generation. In order to avoid collecting objects in younger generations which are referenced by older generations, a remembered set is maintained of references from old generations to young generations. A write barrier between the generations is used to catch changes to these references.

Incremental collection



Incremental collectors allow single collections to be divided into smaller collections. This allows pause times to be limited. Incremental techniques almost always require one of a write barrier or a read barrier to prevent changes to the object connectivity graph mid-way through a collection causing the reachability results to be incorrect.

Concurrent collection

A concurrent collector is one which can execute concurrently with a mutator thread. Even on moderately multi-processor systems, it usually does not make sense to dedicate an entire processor to running garbage collection. Therefore the typical concurrent collection is perhaps more accurately described as a highly incremental collection; each thread is assigned small units of garbage collection work to do along with its application work, so the garbage collection work is finely interleaved with application work. One way of thinking of this garbage collection work is as a 'tax' - threads have to do some garbage collection work in exchange for being able to allocate. As with incremental techniques, concurrent collections need a write- or read-barrier.

Parallel collection

A parallel collector is one which divides the collection work so that multiple collector threads are collecting concurrently. This is counter-productive on single-threaded processors, but increasingly necessary as the number of processors in a system increases. On systems with many processors a single-threaded collector is unable to keep up with the amount of garbage the processors can produce.

Similar techniques are used to achieve incrementality, concurrency, and parallelism, so many collectors which have one of these properties also have some of the others.

Tuesday, 7 October 2008

Garbage collection algorithms


Garbage collection has been the subject of much academic research . In particular, the volume of new techniques with various claimed properties is high enough that it is tempting to coin the phrase YAGA (Yet Another GC Algorithm). For a pointer to many of these articles, see Richard Jones' garbage collection bibliography.


Garbage collection algorithms


Despite the large number of variants, most garbage collection algorithms fall into a few simple categories.


Reference counting


Yes, reference counting is a form of garbage collection. It's sometimes seen as an alternative to garbage collection, and sometimes even as a morally superior alternative to garbage collection, but really, it's just a type of garbage collection algorithm.


Reference counting garbage collectors track the number of references to each object. Often the count is maintained in and by the object itself. When a new reference to an object is added, the reference count is incremented. When a reference is removed, the count is decremented. When the count reaches zero, the object is destroyed and the memory released. Reference counting fails when objects reference one another in a cycle. In these cases each object will be seen to be referenced and will never be freed, even if nothing references the cyclic structure.


Reference counting is unable to deal with cycles unless augmented with occasional invocations of another form of garbage collection. Reference counting has a number of other disadvantages, mostly to do with performance and bounding of work, and so it is rarely used in modern garbage collection systems. However, smart pointers in C++, while not traditionally considered a form of garbage collection, do make use of reference counting. Ad hoc garbage collectors introduced to manage object reachability in complex environments also tend to use reference counting since it is easily implemented in a hostile environment.


Tracing collectors

Most garbage collectors perform some sort of tracing to determine object liveness. Tracing does not have the same problem with collecting cycles as reference counting does. However, tracing collectors are more sensitive to changes in the object graph and usually require the suspension of the application threads or close monitoring of application activity to ensure correctness.


Mark-sweep


Mark-sweep collectors collect in two phases. In the first phase, all reachable objects are traced, begining from a set of roots. The roots are all objects which are guaranteed reachable, including objects referenced from the stack and static variables. Every object directly or indirectly reachable from a root is marked. Objects which do not end up marked are unreachable and therefore cannot be used again by the application, so they may safely be freed.


In the second phase of a mark-sweep collection, all unmarked objects are added to a free-list. Contiguous areas of free memory are merged on the free list.


Mark-compact collectors


Mark-compact mark live objects using identical techniques to those used by mark-sweep collectors. However, mark-compact collectors do not use a free list. Instead, in the second phase all reachable objects are moved so that they are stored as compactly as possible. New allocation is then performed from the empty area of the heap.


Many collectors hybridise these two approaches, combining frequent sweeping with occasional compaction.


Copying collectors



Copying collectors divide the heap into two areas, a to-space and from-space. All objects are allocated in the to-space. When the new space is full, a collection is performed and the spaces are swapped. All reachable objects in the to-space are copied to the from-space, which is declared the new to-space. New allocation is then performed in the new to-space.


Copying collectors have a number of advantages over marking collectors. Because copying is done
in the same phase as tracing, it is not necessary to maintain a space-consuming list of which objects have been marked. Because no sweep is performed, the cost of the collection is proportional only to the amount of live data.


If most objects are unreachable, a copying collecton can be very efficient. The corollary is that if most objects remain reachable, or if some very large objects remain reachable, the collection will be woefully inefficient, because a large amount of memory will need to be copied every collection.


Copying collectors also keep the heap very compact, and this can boost allocation and application performance. However, because a completely empty from-space must be maintained at all times, copying collectors are not space-efficient. Modern collectors tend to estimate an object death-rate and maintain less than half the heap for the from-space accordingly. Copying collectors are also known as semispace collectors.


For a good overview of how this relates to the IBM JVM's garbage collection, see Mattias Persson's developerWorks article.


Sunday, 5 October 2008

Garbage collection myths

I've been lucky enough in my work to learn a fair bit about garbage collection. One of the things I've discovered is how many myths and half-truths exist about garbage collection. Part of the reason these myths have sprung up is that garbage collection has some pretty paradoxical properties. The first myth is that garbage collection is only suitable for the incompetent, unskilled, or lazy. In fact garbage collection offers many architectural and software engineering advantages, even to the skilled developer. The second myth is that garbage collection is all about about collecting garbage. It seems obvious from the name, but it's not true! Garbage collectors also include an allocation component, which, along with their powers of object rearrangement, can make a significant difference to application performance. Thirdly, criticisms of garbage collection often focus on the pause times, and responses to these criticisms often focus exclusively on reducing pause times, in the mistaken belief that small pause times guarantee good application response times. Pause times are also often used as a metric of general application performance, and an increase in pause times is taken as an indicator of worsened performance, when in fact the opposite the opposite is often true. Paradoxically, even the total amount of time spent paused for garbage collection is not a good predictor of the impact of garbage collection on application performance. Finally, the sixth myth is that garbage collection has a disastrous performance impact. While garbage collection can hurt application performance, it can also help application performance to the point where it exceeds the performance with manual memory management. I'll go through each of these in detail in later posts.

But, to start off with, what is garbage collection? Garbage collection is a system of automatic memory management. Memory which has been dynamically allocated but which is no longer in use is reclaimed for future re-use without intervention by the application. Garbage collection solves the otherwise difficult problem of determining object liveness by freeing memory only when it becomes unreachable.

Garbage collection is pretty ubiquitous in modern languages. Garbage collected languages include Java, the .Net languages, Lisp, Python, Perl, PHP, Ruby, Smalltalk, ML, Self, Modula-3, and Eiffel. Some languages which are not traditionally garbage collected offer garbage collection as a pluggable or configurable extension. For example, collectors are available for C++, and Objective-C was recently extended to allow garbage collection. Understanding the garbage collector is an important part of performance tuning in these languages.

Wednesday, 24 September 2008

IBM Monitoring and Diagnostic Tools for Java™ - Health Center™ is now available

I'm part of a team in IBM's Java Technology Centre which develops diagnostic tools, and we've just released a brand new tool. It's called IBM Monitoring and Diagnostic Tools for Java™ - Health Center™ and it's available as an early access beta download from developerWorks. We're really excited about the diagnostic insights it can offer. It allows running JVMs to be observed, health-checked, and tuned. At the moment it gives insight into general system health, application activity, and garbage collection activity. There are visualizations and also recommendations, so you don't have to be an expert to improve the performance of your application.

One of the nice things about the Health Center is that it presents different kinds of information in one place. There's a status page which shows at a glance what's going well and what might need attention:


Garbage collection is one of the first and easiest things to check when tuning an application. Sometimes garbage collection can benefit from a bit of tuning, and sometimes looking at what the garbage collection is doing can highlight application problems. Is your application using more memory than seems reasonable? Is it leaking memory? The Health Center will suggest a command line, diagnose when garbage collection is causing performance problems, and point out memory leaks:



Synchronization can be a big performance bottleneck for applications on multi-CPU systems, and it can be difficult to identify hot locks or even assess the impact of locking. The Health Center has always-on lock monitoring which identifies the objects which are most contended and provides guidance about whether synchronization is hurting performance:


Last, but not least, the Health Center includes a nice sampling method profiler. This can be really valuable in diagnosing applications which are using high CPU. Because it's low overhead, there's no need to specify in advance which parts of the application should be profiled. It works without recompilation or byte code instrumentation. It shows where the application is spending its time and gives full call stacks for all sampled methods. Because of the call stacks, this kind of profiling isn't just for performance problems - sometimes it can give surprising functional insight:


The way the Health Center is able to get so much information without much overhead is that it's closely linked to the JVM, so there are minimum system requirements. The absolute minimum requirement is an IBM JVM, version 5.0 or higher. This will give locking information and some GC information. To get profiling Java 5 SR8 or Java 6 SR1 is required. The fullest GC information is available in Java 6 service refresh 2. IBM JVMs can be downloaded from developerWorks.