Wednesday, September 19, 2007

Class.forName caches defined class in the initiating class loader

In my previous entry, I wrote about how how there is a difference in behavior between Class.forName and ClassLoader.loadClass. Since then I wrote a simple (for class loaders :-) test case to demonstrate the difference.

When calling ClassLoader.loadClass to load a class, the initiating class loader delegates to another class loader which actually defines the class. The defined class is only added to the defining class loader's cache. The cache of the initiating class loader's cache is not altered. So if the initiating class loader delegates to a different defining class loader on a future request for the class, the class is always returned from the defining class loader to which the delegation occurred. This is of course how the ContextFinder from Eclipse is intended to work. ContextFinder is the initiating class loader which uses context from the call stack to select the right bundle class loader to delegate the actual class load request.

However, if Class.forName is used to call the initiating class loader, the behavior with respect to caching and the returned class is quite different. In this case, when the class is first defined, it is cached by the defining class loader as expected. But it is also cached by the initiating class loader which is not expected. Even more unusual and unexpected is that Class.forName, through its native implementation, seems to consult the initiating class loader's cache directly before calling loadClass on the initiating class loader which is the normal place where the class loader's cache is consulted (via ClassLoader.findLoadedClass). As a result, all calls to Class.forName to a initiating class loader always return the same class object (the first one loaded), even if the implementation of the initiating class loader does not define classes or directly consult its own cache.

The test case also showed that ClassLoader.loadClass always works as expected even when interleaved with calls to Class.forName.

If every one always used ClassLoader.loadClass to consult the Thread Context Class Loader (TCCL), then a ContextFinder style TCCL choice would work very well in OSGi (or any similar module system) . However a lot of code uses Class.forName to consult the TCCL which means that a ContextFinder style TCCL is not going to help those callers.

The test case also includes a test to see whether having Class.forName add the class object to the initiating class loader's cache would result in pinning the class in the heap after the class and its defining class loader became garbage collected. This would also be a problem for OSGi since it would cause a ContextFinder style TCCL (which would have a lifetime of the framework) to potentially pin a bundle's class loader and all loaded classes in the heap. Fortunately, this was not an issue. The class object was removed from the initiating class loader's cache once the class and its defining class loader were garbage collected. So, interestingly enough, the reference to the class from the initiating class loader's cache must be a sort of weak reference which allows the class to be garbage collected.

This unexpected behavior of Class.forName does not seem to be documented or explained anywhere that I have located. If you know of any such documentation, please let me know! In any case, there is a problem in designing a useful TCCL solution for OSGi.

8 comments:

Glyn said...

GC isn't impeded by the loaded class cache since the pointers in the cache are part of the VM's native state and aren't traversed when marking live objects during GC.

Have you investigated the issues associated with overriding Class.forName to make it call loadClass on the relevant class loader? Just a thought...

BJ Hargrave said...

While I agree that sounds probable for the initiating class loader in this case, the class references in the class loader cache must be part of the GC object traversal in general. I say this because a class can only be GC'd iff the class' class loader is unreachable. Since each class has a strong reference to its class loader, this means both the class loader and all classes loaded by the class loader must be unreachable for any (and all) of them to be GC'd.

I have not looked at overriding Class.forName. I did toss around the idea of having bundle class loader rewrite bytecodes to change Class.forName calls to ClassLoader.loadClass calls. But that would only help for code loaded from bundles. What about the (probabaly large amount of) calls to Class.forName in the bootclasspath code?

Rajini said...

I came across this post and the other one about the difference between Class.forName and ClassLoader.loadClass while looking at the classloader architecture for Tuscany. In my opinion, classloader constraints should be verified as soon as it is possible to verify them rather than as late as possible, since the closer you are to the source of the problem, the easier it would be to fix it. Most classloader bugs are very difficult to fix, and allowing classes to be used without verification and leaving the error reporting until an instance is created and typecast doesn't seem very reasonable.

I would also think that Class.forName performs much better than ClassLoader.loadClass once the class has been loaded even if the very first call in ClassLoader.loadClass is a findLoadedClass method which returns a cached class. Particularly in the context of OSGi, where a class space does not correspond to standard Java class delegation hierarchy, I would expect Class.forName to be a better method to use than ClassLoader.loadClass to load an imported class (after the first call where the class has been loaded). Class.forName can be inlined to call the native method directly which in turn can return a cached method with only a read lock. On the other hand, I imagine ClassLoader.loadClass for OSGi presumably has to go through a list of classloaders, requiring more code and more locking.

Back to the reason I was reading through the post - I wanted to understand how other OSGi applications solved problems with thread context classloaders (used by libraries Tuscany uses like Axis2) and also by Tuscany itself at the moment, and also the extension model architecture used by Tuscany where a core module loads extension modules (at the moment relying on a single classloader). I was looking for some way to fix these in Tuscany in such a way that Tuscany can be run as multiple bundles in OSGi as well as outside of OSGi with a sensible classloader architecture. Are there any best practices which I should be aware of?

Thank you...

Rajini

Peter said...

I filed a bug at sun, i'll come back here when i've got an answer.

I simplified the test case with a collegue.

> Have you investigated the issues
> associated with overriding
> Class.forName to make it call
> loadClass on the relevant class
> loader? Just a thought...

My collegue has tried this, there is no other workaround than not using Class.forName(..) :(

Peter said...
This comment has been removed by the author.
frenchyan said...

Hi BJ.

I know that this post is 'old' now, but I have been trying to follow the advice and bumping into some issues. I replaced in my code all calls to Class.forName with ClassLoader.loadClass (determining the correct ClassLoader) and have the following problem:

Caused by: java.lang.ClassNotFoundException: [Lcom.linkedin.repdb.pub.leo.domain.MemberPosition2;
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)

which puzzled me for a while. After contacting Sun, I got the following answer:


The code sample is trying to obtain a Class instance for an array type ("[L") but Classloader.loadClass() only understands simple class and interface types, not arrays.

Array types do not have a direct representation in the virtual machine class file format and are internally generated by the JVM. So, Classloader.loadClass() doesn't understand array class name syntax since it is meant to be a low-level mechanism used by the JVM for loading classes and interfaces represented in the class file format.

Meanwhile, Class.forName() asks the JVM for the Class so any array class name syntax can be recogized by the JVM and it can lookup or generate the requested array type and return it.

Applications should basically never call Classloader.loadClass(). It may appear to work but it is often subtly wrong, can be a source of latent bugs, and is almost never the best choice. They should instead call Class.forName() using the 3 parameter version that takes a specific Classloader instance.

On the other hand, Classloaders are also used to load non-class resources such as images, sound samples, and textual data. The JVM doesn't know anything about these resources so this aspect of Classloaders is used only by applications and Java library code and not by the JVM itself (Hotspot).

Is this confusing? Yes! In retrospect, this seperation of responsibility should have been reflected more clearly in the API.


So I am quite confused now in what to use. Sun is not recommending to use ClassLoader.loadClass and in some cases replacing Class.forName fails (as with my example and arrays). Is using Class.forName(name, true, classLoader) a good alternative ?

Any advice ?
Thanks

guddu said...

Thanks for this great thought.
I got some of my doubts cleared after reading this blog.

http://www.interview-questions-tips-forum.net

Dimitri Alexeev said...

However a lot of code uses Class.forName to consult the TCCL which means that a ContextFinder style TCCL

would you provide some examples of popular libraries with Class.forName + TCCL pattern?