Showing posts with label classloading. Show all posts
Showing posts with label classloading. Show all posts

Monday, October 14, 2013

API Design Practices That Work Well With OSGi

Introduction

This post describes some API design practices that should be applied when designing Java API to ensure the API can be used properly in an OSGi environment. Some of the practices are prescriptive and some are proscriptive. And, of course, other good API design practices also apply.

The OSGi environment provides a modular runtime using the Java class loader concept to enforce type visibility encapsulation. Each module will have its own class loader which will be wired to the class loaders of other modules to share exported packages and consume imported packages.

A package can contain an API. There are two roles of clients for these API packages: API consumers and API providers. API consumers use the API which is implemented by an API provider.

In the following design practices, we are discussing the public portions of a package. The members and types of a package which are not public or protected (that is, private or default accessible) are not visible outside of the package and are implementation details of the package. 

Packages must be a cohesive, stable unit

A Java package must be designed to ensure that it is a cohesive and stable unit. In OSGi, the package is the shared entity between modules. One module may export a package that another module can import. Because the package is the unit of sharing between modules, a package must be cohesive in that all the types in the package must be related to the specific purpose of the package. Grab bag packages like java.util are discouraged because the types in such a package often have no relation to each other. Such non-cohesive packages can result in lots of dependencies as the unrelated parts of the package reference other unrelated packages and changes to one aspect of the package impacts all modules that depend on the package even though a module may not actually use the part of the package which was modified.

Since the package is the unit is sharing, its contents must be well known and the contained API only subject to change in compatible ways as the package evolves in future versions. This means a package must not support API supersets or subsets; for example, see javax.transaction as a package whose contents are very unstable. The user of a package must be able to know what types are available in the package. This also means that packages should be delivered by a single entity (for example, a jar file) and not split across multiple entities since the user of the package must know that the entire package is present.

Finally, the package must evolve in a compatible way over future versions. So a package should be versioned and its version number must evolve according to the rules for semantic versioning.

Minimize package coupling

The types in a package can refer to the types in other packages. For example, the parameter types and return type of a method and the type of a field. This inter-package coupling creates what are called uses constraints on the package. This means that an API consumer must use the same referenced packages as the API provider in order for them to both understand the referenced types.

In general, we want to minimize this package coupling to minimize the uses constraints on a package. This simplifies wiring resolution in the OSGi environment and minimizes dependency fan-out simplifying deployment.

Interfaces preferred over classes

For an API, interfaces are preferred over classes. This is a fairly common API design practice that is also important for OSGi. The use of interfaces allow implementation freedom as well as multiple implementations. Interfaces are important to decouple the API consumer from the API provider. It allows a package containing the API interfaces to be used by both the API provider who implements the interfaces and the API consumer who call methods on the interfaces. In this way, API consumers have no direct dependencies on an API provider. They both only depend upon the API package.

Abstract classes are sometimes a valid design choice instead of interfaces, but generally interfaces are the first choice.

Finally, an API will often need a number of small of concrete classes such as event types and exception types. This is fine but the types should generally be immutable and not intended for subclassing by API consumers.

Avoid statics

Statics should be avoided in an API. Types should not have static members. Static factories should be avoided. Instance creation should be decoupled from the API. For example, API consumers should receive object instances of API types through dependency injection or an object registry like the OSGi service registry.

The avoidance of statics is also good practice for making testable API since statics cannot be easily mocked.

Singletons

Sometimes there are singleton objects in an API design. However access to the singleton object should not be through statics like a static getInstance method or static field. When a singleton object is necessary, the object should be defined by the API as a singleton and provided to API consumers through dependency injection or an object registry as mentioned above.

Avoid class loader assumptions

APIs often have extensibility mechanisms where the API consumer can supply the name of a class the API provider must load. The API provider must then use Class.forName (possibly using the thread context class loader) to load the class. This sort of mechanism assumes class visibility from the API provider (or thread context class loader) to the API consumer. API designs must avoid class loader assumptions. One of the main points of modularity is type encapsulation. One module (for example, API provider) must not have visibility to the implementation details of another module (for example, API consumer).

API designs must avoid passing class names between the API consumer and API provider and must avoid assumptions regarding the class loader hierarchy and type visibility. To provide an extensibility model, an API design should have the API consumer pass class objects, or better yet, instance objects to the API provider. This can be done through a method in the API or through an object registry such as the OSGi service registry. See the whiteboard pattern.

The java.util.ServiceLoader class also suffers from class loader assumptions in that it assumes all the providers are visible from the thread context class loader or the supplied class loader. This assumption is generally not true in a modular environment.

Don't assume permanence

Many API designs assume only a construction phase where objects are instantiated and added to the API but ignore the destruction phase which can happen in a dynamic system. API designs should consider that object can come and they can go. For example, most listener APIs allow for listeners to be added and removed. But many API designs only assume objects are added and never removed. For example, many dependency injection systems have no means to withdraw an injected object.

In a modular system, modules can be added and removed, so an API design that can accommodate such dynamics is important. The OSGi Declarative Services specification defines a dependency injection model for OSGi which supports these dynamics including the withdrawal of injected objects.

Clearly document type roles for API consumers and API providers

As mentioned in the introduction, there are two roles for clients of an API package: API consumers and API providers. API consumers use the API and API providers implement the API. For the interface (and abstract class) types in an API, it is important that the API design clearly document which of those types are only to be implemented by API providers vs. those types which can be implemented by API consumers. For example, listener interfaces are generally implemented by API consumers and instances passed to API providers.

API providers are sensitive to changes in types implemented by both API consumers and API providers. The provider must implement any new changes in API provider types and must understand and likely invoke any new changes in API consumer types. An API consumer can generally ignore (compatible) changes in API provider type unless it wants to change to invoke the new function. But an API consumer is sensitive to changes in API consumer types and will probably need modification to implement the new function. For example, in the javax.servlet package, the ServletContext type is implemented by API providers such as a servlet container. Adding a new method to ServletContext will require all API providers to be updated to implement the new method but API consumers do not have to change unless they wish to call the new method. However, the Servlet type is implemented by API consumers and adding a new method to Servlet will require all API consumers to be modified to implement the new method and will also require all API providers to be modified to utilize the new method. Thus the ServletContext type has an API provider role and the Servlet type has an API consumer role.

Since there are generally many API consumer and few API providers, API evolution must be very careful when considering changes to API consumer types while being more relaxed about changing API provider types. This is because, you will need to change the few API providers to support an updated API but you do not want to require the many existing API consumers to change when an API is updated. API consumers should only need to change when the API consumer wants to take advantage of new API. OSGi is now defining documentary annotations, @ProviderType and @ConsumerType, to mark the roles of types in an API package.

Conclusion

When next designing an API, please consider these API design practices. Your API will then be usable in both OSGi and non-OSGi environments.

Wednesday, September 19, 2007

Class.forName caches defined class in the initiating class loader

In my previous entry, I wrote about how how there is a difference in behavior between Class.forName and ClassLoader.loadClass. Since then I wrote a simple (for class loaders :-) test case to demonstrate the difference.

When calling ClassLoader.loadClass to load a class, the initiating class loader delegates to another class loader which actually defines the class. The defined class is only added to the defining class loader's cache. The cache of the initiating class loader's cache is not altered. So if the initiating class loader delegates to a different defining class loader on a future request for the class, the class is always returned from the defining class loader to which the delegation occurred. This is of course how the ContextFinder from Eclipse is intended to work. ContextFinder is the initiating class loader which uses context from the call stack to select the right bundle class loader to delegate the actual class load request.

However, if Class.forName is used to call the initiating class loader, the behavior with respect to caching and the returned class is quite different. In this case, when the class is first defined, it is cached by the defining class loader as expected. But it is also cached by the initiating class loader which is not expected. Even more unusual and unexpected is that Class.forName, through its native implementation, seems to consult the initiating class loader's cache directly before calling loadClass on the initiating class loader which is the normal place where the class loader's cache is consulted (via ClassLoader.findLoadedClass). As a result, all calls to Class.forName to a initiating class loader always return the same class object (the first one loaded), even if the implementation of the initiating class loader does not define classes or directly consult its own cache.

The test case also showed that ClassLoader.loadClass always works as expected even when interleaved with calls to Class.forName.

If every one always used ClassLoader.loadClass to consult the Thread Context Class Loader (TCCL), then a ContextFinder style TCCL choice would work very well in OSGi (or any similar module system) . However a lot of code uses Class.forName to consult the TCCL which means that a ContextFinder style TCCL is not going to help those callers.

The test case also includes a test to see whether having Class.forName add the class object to the initiating class loader's cache would result in pinning the class in the heap after the class and its defining class loader became garbage collected. This would also be a problem for OSGi since it would cause a ContextFinder style TCCL (which would have a lifetime of the framework) to potentially pin a bundle's class loader and all loaded classes in the heap. Fortunately, this was not an issue. The class object was removed from the initiating class loader's cache once the class and its defining class loader were garbage collected. So, interestingly enough, the reference to the class from the initiating class loader's cache must be a sort of weak reference which allows the class to be garbage collected.

This unexpected behavior of Class.forName does not seem to be documented or explained anywhere that I have located. If you know of any such documentation, please let me know! In any case, there is a problem in designing a useful TCCL solution for OSGi.

Thursday, July 19, 2007

Why do Class.forName and ClassLoader.loadClass behave different?

ClassLoader.loadClass or Class.forName seem to be synonyms for the same basic operation: request a dynamic class load. Yet calling Class.forName does additional "checking" which is not very useful (certainly in OSGi).

When doing a dynamic class load, the returned type has no implied type by the code. The code must use reflection to access static members or to create an instance. A created instance can either be reflected upon or cast to a type which must already implicitly know by the code. This cast will result in a runtime type check which will ensure type safety.

This is very different than an implicit class load done by the VM to resolve a class constant pool entry. Since it is the goal of these implicit class loads to avoid runtime type checks, the loader constraints are used to ensure type safety.

It does not seem necessary or reasonable to impose loader constraint checks on some dynamic class load requests. That is, code calling Class.forName (or ClassLoader.loadClass) and the VM resolving a class constant pool entry have different type safety needs. The former does not require loader constraint checks since that will be done at runtime by a type cast if needed. The latter does require loader constraint checks to avoid the need for runtime type checks.

So it does seem reasonable to have Class.forName behavior altered to avoid loader constraint checks. Only internal VM class load requests need to check loader constraints.

[2013-10-14 - Updated links to fix link rot from Oracle acquisition of Sun.]

Wednesday, July 18, 2007

ContextFinder in Eclipse is broken

[Updated (19 July 2007): Further discussion with Glyn Normington and Tom Watson indicates it is the Class.forName(...,TCCL) form of loading a class from the TCCL that presents the issues since it triggers the class loading constraints while the ClassLoader.loadClass form does not.]

I was doing some reading and thinking about the ThreadContextClassLoader (TCCL) issue in OSGi environments. Many libraries use the TCCL to load classes. The real problem comes when the library ONLY uses Class.forName(...,TCCL) to load a class and doesn't try it's own class loader first or TCCL.loadClass.

Eclipse created the ContextFinder to try and address the TCCL issue in OSGi. It's goal is to find a bundle's class loader on the call stack and then delegate to that class loader to handle the load request. The normal OSGi bundle class loader rules as well as a further Buddy Policy extension is then used to find the requested class.

However, it turns out there are problems with this approach when Class.forName(...,TCCL) is used. One of which is class loader constraint violation. The other is inadvertent pinning of classes in memory as they are added to the constraint table. After reading a classloader paper (from 1998!), I think the ContextFinder model (that is a single, framework-wide shared TCCL) used by Eclipse is fatally flawed. It is virtually guaranteed to violate class loading constraints in the face of multiple package versions. Not to mention the fact that it can pin classes and class loaders in memory for as long as the ContextFinder is reachable.

Specifying a way to handle TCCL in OSGi may not be possible. Given 3 bundles: A, B, C having two versions of each: A1, A2, B1, B2, C1, C2. Imagine C1 imports a package from B1 and B1 imports a package from A1. Further C2 imports a package from B2 and B2 imports a package from A2. A type from A1 does not appear in the signature of B1 and a type from A2 does not appear in the signature of B2. Thus via B1, C1 cannot "see" A1 and via B2, C2 cannot "see" A2. Also imagine that C1 import the package from A2 used by B2 and C2 imports the package from A1 used by B1. We now have the wiring depicted below:


C1 C2
^^ ^^
| \ / |
| \ / |
B1 \/ B2
^ /\ ^
| / \ |
| / \ |
A1 A2


This is entirely possible and works fine in normal (non-TCCL) class loading. C1 is not exposed to A1 via B1, so it's use of the package from A2 results in no conflict. If C1 loads a class from B1 which results in the load of a class from A1, then B1 is the initiating class loader of the request to A1. When C1 loads a class from A2, C1 is the initiating class loader. So C1 is never the initiating class loader for loads from both A1 and A2.

However, when we have a single, shared TCCL and bundle D calls Class.forName(...,TCCL) to load a class, the shared TCCL will be the initiating class loader for loads from both A1 and A2 (or B1 and B2, or C1 and C2) depending upon whatever context is used to select the defining class loader. C1 or C2 may have caused D to request the class load. Since the TCCL is the initiating class loader, any class load initiated by it for some class P must always return the same class object. In the presence of multiple versions (and even without multiple versions if two bundles are unlucky enough to choose the same fully qualified class name), loader constraints will eventually be violated.

Thus the Eclipse ContextFinder model is broken and we obviously should not spec it in OSGi. However, I am currently at a loss for a reliable solution to the TCCL problem for OSGi. We can certainly recommend TCCL.loadClass is used instead of Class.forName(...,TCCL) but there is a large body of code already out there which already uses the latter form.