Sunday, 5 October 2008

Garbage collection myths

I've been lucky enough in my work to learn a fair bit about garbage collection. One of the things I've discovered is how many myths and half-truths exist about garbage collection. Part of the reason these myths have sprung up is that garbage collection has some pretty paradoxical properties. The first myth is that garbage collection is only suitable for the incompetent, unskilled, or lazy. In fact garbage collection offers many architectural and software engineering advantages, even to the skilled developer. The second myth is that garbage collection is all about about collecting garbage. It seems obvious from the name, but it's not true! Garbage collectors also include an allocation component, which, along with their powers of object rearrangement, can make a significant difference to application performance. Thirdly, criticisms of garbage collection often focus on the pause times, and responses to these criticisms often focus exclusively on reducing pause times, in the mistaken belief that small pause times guarantee good application response times. Pause times are also often used as a metric of general application performance, and an increase in pause times is taken as an indicator of worsened performance, when in fact the opposite the opposite is often true. Paradoxically, even the total amount of time spent paused for garbage collection is not a good predictor of the impact of garbage collection on application performance. Finally, the sixth myth is that garbage collection has a disastrous performance impact. While garbage collection can hurt application performance, it can also help application performance to the point where it exceeds the performance with manual memory management. I'll go through each of these in detail in later posts.

But, to start off with, what is garbage collection? Garbage collection is a system of automatic memory management. Memory which has been dynamically allocated but which is no longer in use is reclaimed for future re-use without intervention by the application. Garbage collection solves the otherwise difficult problem of determining object liveness by freeing memory only when it becomes unreachable.

Garbage collection is pretty ubiquitous in modern languages. Garbage collected languages include Java, the .Net languages, Lisp, Python, Perl, PHP, Ruby, Smalltalk, ML, Self, Modula-3, and Eiffel. Some languages which are not traditionally garbage collected offer garbage collection as a pluggable or configurable extension. For example, collectors are available for C++, and Objective-C was recently extended to allow garbage collection. Understanding the garbage collector is an important part of performance tuning in these languages.

No comments: