Friday, November 22, 2013

Talkin’ ’bout my Generation, or How I Learned More Than I Ever Wanted About JVM Memory

Profiling a Java application is an experience many developers may never encounter. Identifying the source of a memory leak is probably even more rare. Those kinds of investigations are typically handled by teams dedicated to the subject, or just deferred by throwing more memory at the problem. Until recently, I’ve been one of those lucky developers, blissfully ignorant of what goes on within a JVM. In this article, I seek to share my journey in tracking down a performance issue and what I learned along the way.
The journey begins with an open source scala project built on the play framework running on a 64 bit SL6 box with Java hotspot 1.6. The code base has a small footprint and is not overly complex. Its external resources consist of a mysql database, an internal solr index for searching, and minimal file system interaction. The critical feature of the app is a REST endpoint that handles very large HTTP PUT requests.
We’ve been running the app for some time in production under light load with no complaints. Just recently we started importing a large amount of records into the system via the aforementioned REST endpoint. That’s when we started to observe problems. Periodically the service would crash. The error was consistently “java.lang.OutOfMemoryError: PermGen space”. We observed that given enough time, this error was guaranteed to occur. Critically, it wasn’t simply time: it was after enough requests. I determined that the issue wasn’t going to go away and we needed to face it head on. And thus I embarked on my journey.
The first step was understanding what “PermGen” even means. Research revealed that PermGen stands for “Permanent Generation,” but before that could make sense, I needed to know a little bit more about JVM garbage collection (GC), which is whence the term “generation” comes. Put simply, garbage collection is the process by which old objects get removed from memory (aka “the heap”). Old objects consist of things like class instances that have been created within a function or class scope. When those scopes go away then so should those instances. That makes sense. So in more detail, the actual process of garbage collection consists of sweeping through the heap and determining each object’s classification. The GC has four major classifications for objects named Eden, Survivor 1, Survivor 2, and Old. These classifications are known as Generations. Objects are born into Eden and progressively promoted into the Old Generation at which point they are referred to as tenured [Footnote 1].
OK, so where does the Permanent Generation fit in? The Permanent Generation is a space outside the heap reserved for storing information about the Java classes your app uses. The PermGen has a fixed size that can be set with a JVM option. Class information in the PermGen is managed by ClassLoader instances, the most common of which is the Java system class loader or sun.misc.launcher.AppClassLoader. Generally, the loaded class information takes up a small, fixed memory footprint and you don’t need to put much thought into it.
Classes get loaded into the PermGen on demand, such that only the classes used by your app will take up space. The twist is when you start using Java Reflection. The Java Reflection package is an extremely powerful meta programming tool that gives you the ability to query class information and even create new classes at runtime. Reflection works in Java by calling through the Java Native Interface to get back the loaded class information from the JVM. By default, if information about the same class is requested more than 15 times [Footnote 2], then a new class will be created to hold that information. This process is called inflation and the new class information is stored in a DelegatingClassLoader.
Application frameworks that make heavy use of reflection will observe several instances of DelegatingClassLoaders in their PermGen. One such framework is the scala play framework.
Now that we have a pretty good idea of the PermGen’s role, it’s important to understand how garbage collection works in the PermGen space. I read in a few places that “classes are forever,” but in fact that is not the case. When garbage collection runs in the PermGen, it is true that it will not collect classes. However, what it will collect are ClassLoader instances that no longer hold references to any active classes. So, if all the classes in a given ClassLoader instance have expired, then that instance and all classes it references will go away. This is pretty much never going to happen with the Java system class loader [Footnote 3], but it could very well happen with a DelegatingClassLoader.
With all this knowledge in hand, I finally felt ready to start investigating what might be going on. The JDK comes with a tool called jmapjmap is an excellent tool for analyzing what’s happening with the memory in a JVM. The first command I tried out was jmap -permstat[Footnote 4]. Wow! Pretty cool; this shows me every class loader in the PermGen, the number of classes for which its responsible, the space it’s occupying, and its health. The first thing to jump out at me was that I observed a good deal of dead DelegatingClassLoader instances.
Weird. So, why aren’t those getting garbage collected if they are dead? After some more researching into JVM options, I discovered the following two options: -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC. In the absence of these two flags, garbage collection will not remove unused class loader instances from the PermGen space [Footnote 5].
Alright! Now we’re getting somewhere. These flags might be just what I’m missing. In order to find out, I set up a test system on which I could simulate the initial problem and then apply the new JVM options, rerun the simulation, and see how it holds up.
For the simulation, I wanted to start up my app with a small PermGen and keep my eye on the usage. The PermGen size is configurable at JVM startup with the option: -XX:MaxPermSize=68M. I choose 68M due to my observation that typically on startup the app immediately used up just under 60MB and quickly grew to near 68MB after a few requests. To keep my eye on the PermGen usage, I used jmap -heap, which shows both the total PermGen allocated as well as the current PermGen usage. Both of these values are important because although the PermGen has a max size, it will not use it all right away. Next up, jmap displays a whole load of useful information, but I was only interested in PermGen, so a little grep filtering brought me to jmap -heap $pid | grep -A 4 'Perm Gen'. Finally, I want to watch this as it changes, so throwing watch into the mix produced 
declare pid=$(pidof java); 
watch -n 2 "sudo /usr/java/jdk1.6.0_26/bin/jmap -heap $pid 2>&1 | grep -A 4 'Perm' "
With my watch in place, I started simulating load with a while loop in bash to hit the web endpoint with curl requests. The PermGen quickly rose and eventually peaked at 100% and then crashed after about 500 total curl requests. I set 500 as my base line for comparison and restarted the JVM with the GC flags -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC. Re-running my load simulation script, I watched the PermGen rise just like before. This time however, at around 500 requests, the PermGen usage dropped. The load simulation continued and I reached around 2000 requests before the PermGen finally ran out.
My conclusion from the experiment is that the GC flags successfully configured the JVM to clean up the PermGen space and consequently improved performance over time. The new parameters have been deployed into production and we haven’t seen a crash!
My journey isn’t over yet. I’m not 100% satisfied with this solution because I still see dead DelegatingClassLoader instances in my jmap -permstat output and I’m mildly suspicious there may be a memory leak with the framework. I will continue to monitor PermGen usage and see how things progress. As an ultimate fall back option, I can disable inflation.
…and that’s it! Hopefully this can serve a helpful starting point for anyone facing similar situations.
Some More Notes
I’m also including some more notes here that I learned along the way, but weren’t really pertinent to the discussion about heap dump and analysis. 
  • The JVM can dump your heap to a file when it crashes. To do this enable +XX:HeapDumpOnOutOfMemoryError and set a path for the dump file, -XX:HeapDumpPath=/path/for/dumps/java-.hprof [Footnote 6]. 
  • I used the Eclipse MAT tool to analyze my dump file and had a great experience. It’s worth reading the manual on its basic functions before using it. The default reports available gave me an immense amount of actionable information. If the OOM errors recur, I will likely be turning to MAT to dig to the bottom of the (potential) leak.
  • This article was substantially valuable in wrapping my head around how the JVM interacts with native memory,

Footnote 1: To see details of tenure calculations, enable -XX:+PrintTenuringDistribution
Footnote 2: See a quick description of how to control the process with system properties (sun.reflect.noInflation and inflationThreshold = 15),
Footnote 3: This actually not completely out of the question. Application frameworks that employ the “hot deploy” method of swapping in new jars without stopping the JVM will ideally recycle the Java system class loader. See for a good description how this can get you into undesirable situations. To read about more people experiencing this problem,
Footnote 4: The permstat information is expensive to calculate, and I’d suggest redirecting the output to a file so you can manipulate it.
Footnote 6: JVM Options WARNING You must ensure there is no file with the same name at the dump path or the JVM will refuse to overwrite it. To increase entropy in the dump file name, use the placeholder in the path you pass.