Bob Balaban's Blog

     
    alt

    Bob Balaban

     

    Geek-o-Terica 5: Taking out the Garbage (Java)

    Bob Balaban  May 4 2009 02:00:00 PM
    Greetings, Geeks!

    I posted a short article last week on garbage collection in LotusScript. This post is about a whole different set of issues you need to be aware of if you're writring Java code for Notes or Domino.

    Like LotusScript, the Java language creates objects with a built-in "new" operator, and, like LotusScript sweeping up the no-longer-used memory ("garbage collection",or gc) is supposed to be automatic. But. The gc mechanism in Java is nothing like the one in LotusScript -- you have to do more work to avoid memory leaks, and you have to know more about how it all works. The basic saying people like to quote about gc in Java is: "You never have to worry about memory." I generally add to that: "Until you run out."

    There are two major differences in the way LotusScript and Java each handle allocated memory: the first has to do with how many kinds of memory there are, and the second has to do with when each language does gc.

    How many kinds of memory are involved with a Notes/Domino Java program? At least 2. First there's all the memory your java program allocates directly in the Java Virtual Machine (JVM): space for your code, space for your objects, space for the JVM itself to use. This all comes from the Java "heap": a big pool of memory that the JVM gets from the operating system, and parcels out to your program, and any other Java programs that might be running at the time.

    The second set of memory that your Java program is going to consume comes from a completely different place: It comes from the "Notes runtime heap" -- a different big pool of memory that the Notes/Domino core (which, after all, has nothing to do with any JVMs) gets from the operating system. This pool is what the Notes back-end classes use, as well as the Notes core itself. So, for example, when you instantiate a lotus.domino.Session object (or Notes does it for you, if you're running an Agent), a number of things happen:

      - If the JVM isn't running yet, it starts up. A bunch of .jar files are pre-loaded, using up some JVM heap.
      - Notes (or Domino) creates an instance of the Session class for the agent to use. This uses a bit more memory from the JVM heap. The new Java Session object calls into the back-end classes DLL ("lsxbe") to initialize the corresponding C++ Session object. Every Notes Java object is linked to a corresponding lsxbe back-end C++ object.
      - The C++ object that gets created to go with the new Java object uses up some memory of its own. This memory, however, comes from the "other" heap -- what I called the "Notes runtime heap" above. ALL of the lsxbe objects that are linked to the Java objects you create and use in your Agent come from this other heap. Some of them (Docuemnts, Databases, and others) consume Notes CAPI resources, such as NOTEHANDLEs and DBHANDLEs, which might represent lots and lots of other allocated memory in the Notes core (just as one example, an "open" Notes document that consumes 10mb of disk space might also consume 10mb of memory).

    So here's where it gets interesting: The JVM has a background thread that runs all the time (at lower priority than "normal" application threads), looking for objects which have been allocated out of the JVM heap and which are no longer used anywhere. When it finds such, it frees the memory used by those objects, and that memory in the JVM heap is then available for re-use. HOWEVER, there is no automatic mechanism by which the C++ objects associated with those Java objects can be notified to free up the memory THEY are consuming (which is often far larger). If nothing is done, all of that memory taken from the Notes runtime heap is "leaked": it never gets released.

    Thus, because there's no way for Notes to know for sure when a given Java object is being garbage-collected, we need another way to tell the back-end classes to clean up and take out the garbage. Unfortunately, the only way is to enlist the aid of the developer: you have to TELL lsxbe to clean up with the notorious recycle() call. What makes this unfortunate, from a product point of view, is that developers have to know to do this, and have to know how to do it correctly, so that they don't mess theselves up accidentally.

    So what, exactly, does recycle() do? First, it finds the link stored in the Java object to the corresponding C++ object. Then it invokes some code in lsxbe to destroy that C++ object. The "destructor" code in the back-end classes does a couple of things: it first finds and destroys any "owned" objects that it knows about. When you invoke recycle() on a lotus.domino.Database object, for example, any Document objects that were instantiated out of that database are also destroyed. The object's destructor also knows how to lilnk back to the JVM and tell it to invalidate (and garbage-collect) the corresponding Java object. It also tells the Notes CAPI to release any temporary memory it has consumed, and then the memory taken up by the C++ object itself is reclaimed.

    Try this experiment: FInd (or create) a Notes database that contains a view with 50,000 or so documents in it (the number of fields in the document is relatively unimportant for this purpose). Then write a Java agent that walks the view, and accesses each document (just use View.getFirstDocument/getNextDocument). Don't use recycle(). If your view is big enough, you might actually crash Notes this way, because every time you assign a new Document object to your local variable inside the iteration loop, the previous object referred to by that variable will be gc'ed, but the corresponding C++ document object will not.

    Of course, even when you leak megabytes of memory in this way, Notes (and the operating system) get it all back when the Agent terminates. Why? Because the CurrentDatabase and the Session objects that Notes created for your Agent to use get recycled automatically when the Agent is done, and therefore all other objects "owned" by that Session and Database (i.e., all of them) get recycled automatically. Of course, if you run out of memory before that point, you're screwed.

    So: recycle early, recycle often!

    Believe it or not, there's actually a bunch more to say on this topic. Look for my next blog post in a couple of days: "So Now it Gets Complicated: Java, Garbage-Collection, Notes, and Threads".

    Geek ya later.

    (Need expert application development architecture/coding help? Contact me at: bbalaban, gmail.com)
    Follow me on Twitter @LooseleafLLC
    This article ┬ęCopyright 2009 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.