Friday, September 30, 2011

The Needs of the Many Outweigh the Needs of the Few

There is a discussion on the Aries dev mail list about a tool for checking semantic versioning. One of the issues misunderstood in the discussion is about the asymmetry in the treatment of versions for the roles of API consumer and API provider discussed in the whitepaper.

An API package p can contains several types. Some types must be implemented by the API provider and some are intended to be implemented by the API consumer. p.S may be a service provided by the API provider and used by the API consumer. p.L may be a listener implemented by the API consumer and user by the API provider. A syntactic analysis tool for versioning needs to understand the orientation of a type to decide whether a change to the type constitutes a major version change for the package or a minor version change.

The whitepaper does not discuss how the orientation of the type should be marked. It is beyond the scope of the whitepaper. OSGi now uses the @noimplement javadoc tag (from Eclipse) to indicate the type is not to be implemented by the API consumer (e.g. p.S). Types not marked @noimplement may be implemented by the API consumers (e.g. p.L). Adding a new method to p.S represents a minor version increment to package p as API consumers are not broken (while API providers are broken; but they have tighter version constraints, e.g. [1.0,1.1)). Adding a new method to p.L represents a major version increment to package p as API consumers are broken. Furthermore adding a new type to package p will not break API consumers. They are free to ignore it. But API providers must change to support the new type.

These are examples of the asymmetry between API providers and API consumers: an API provider must provide all of the API while an API consumer is free to use any subset of the API.

So in order for any syntactic analysis tool to properly work, the orientation of the types in the package must be properly marked so the tool can decide whether a change warrants incrementing the major or minor version. What is missing is an agreement on how to mark the types. I think this is something bnd and bndtools should provide. Perhaps a set of standard annotations.

For any API, we see there is an asymmetry in the relationships is has with API providers and API consumers. In general, there will (hopefully) be many more consumers than providers. So the semantic versioning scheme is oriented towards the many: the API consumers.

"... the needs of the many outweigh the needs of the few." - Spock.

PS. It has also been suggested that segregating consumer oriented types from provider oriented types by placing them in different packages is useful. But this does not remove the need for syntactic analysis tools to understand the orientation of each type (or each package) to provide proper advice about necessary versioning changes. And now you have 2 packages which each provider and consumer must import...

12 comments:

Chris Aniszczyk (zx) said...

Why reinvent the wheel when PDE API Tools has all this code and more?

Emily Jiang said...

The PDE API tool only works on bundle versioning not package versioning. Besides, Import-Package is favoured over Require-Bundle (should avoid Require-Bundle if possible as it makes bundles tightly coupled). The package version is fine-grained but bundle version is coarse-grained.

Chris Aniszczyk (zx) said...

Why not enhance PDE API Tools to do Import-Package level versioning? Most of the code is there, doesn't make sense to constantly reinvent the wheel across communities (e.g., Apache vs. Eclipse)

Eike Stepper said...

Please enhance PDE, don't try to duplicate it. Two different tools that both have their advantages and disadvantages in different areas is not what users will like.

BTW. have you ever asked yourself why so many bundles have no versioned package exports at all and can not be imported with a versioned dependency as a consequence? In almost all cases when I talked to the respective developers it turned out that they were completely unaware that the bundle's version does not serve as a default for those packages that are not exported with an explicit package version. More convention over configuration would help a lot to get rid of nasty Require-Bundle headers.

Neil Bartlett said...

Chris, this is tiresome... I might as well ask you why PDE API Tooling is trying to reinvent the wheel (but I won't).

Bnd already has package level version analysis, and is usable in more environments than PDE API Tooling, which only works on PDE projects. All bnd is missing is the "orientation" annotation described by BJ, and it seems obvious to me that implementing them in Bnd will be both easier and more useful for more people than trying to do it in the PDE API Tooling.

That's not to say that PDE developers shouldn't try to enhance their tools as well.

Simon Chemouil said...
This comment has been removed by the author.
Simon Chemouil said...

Eike, semantic versioning is precisely useful so that people don't have to manually specify the version of the packages they export, based on static analysis. So in this case, users who don't know that packages have their own versions, independent of the version of the bundle containing them, would still have proper versions managed for them (and the bundle version would in turn be deduced by the versions of the packages it contains).

However, in your anecdote, it seems to me the problem is not "convention over configuration", but rather the ignorance of those developers (I don't mean it pejoratively). For convention over configuration to work, there needs to be a convention :). There is no convention other than semantic versioning for package versions, because a package might move to another bundle at any time, and that bundle may have a different version. The package is the module in OSGi, and the bundle is just the deployment unit -- problems arise when these are mixed together.

These developers you discussed with are using OSGi and manage to get things to "work" but don't want to dig further, maybe because they have no time or that those technical considerations are disconnected from their primary interest: writing business code. The same kind of developers often writes code that runs fine in OSGi for their need, but doesn't support dynamism (e.g, not nulling gone services references, passing service objects around to other bundles, stateful static methods in API and state not linked to the bundle's lifecycle, etc...). The ergonomics of PDE and the manifest-first approach it proposes by default let those developers get their code running in an OSGi container very fast, but often poorly packaged.

Considering this, any improvement in PDE that helps better packaging would be welcome. Yet I see no problem with multiple Free implementations, in Apache and Eclipse, as far as everyone agrees on the same set of annotations to interoperate: that's what this entry is about so I don't really get the attitude: - "I wrote new code and want to donate it to Apache" - "Why didn't you improve Eclipse instead?", especially when the post author is not directly involved with that piece of code. There are plenty reasons one would want to write fresh code rather than improve an existing codebase (fun, dependencies of the said code base, ability to embed the code easily, ..., ???, profit!)

So kudos to Emily and let's hope those annotations gets standardized and applied to APIs soon, that PDE gets its version, based or not on Emily's work -- and that we get a Maven plugin that does all this version handling for us!

Eike Stepper said...

Hi Simon,

First I'd like to withdraw my second half sentence "don't duplicate PDE". Actually I don't care about duplication as long as the tool I'm (and my team is) using does all the things we want to have done, and does them right. We're using API Tools and API baselines for years and I think it does (in most cases) a great job. What bugs us most is the inability to distinguish between different groups of consumers (bugzillas 230279, 191292).

Then, although personally I appreciate API baselines, tools, etc. very much, I'm pretty sure that the majority of developers haven't even heard about these fancy things, don't care about it, fear the additional installation and configuration, hate the build time overhead and so on. For these developers, their products and, probably the most important, their consumers I wish there was the convention to use the bundle version as the default for all unversioned packages. I'm aware that there's no such convention and that's exactly what I'm complaining about. I also understand that using a tool to maintain version on package level would solve the issue of missing package versions in a much better way but, as a consumer of many products, it's just an illusion that it will happen any time soon. The current effect is simply that I can not consume these products through (versioned!) package imports. Is that what you're trying to advocate?

Then you're saying "The package is the module in OSGi, and the bundle is just the deployment unit". I doubt that. Maybe the term module has been reinterpreted in the more recent past, the About page (see L1 Module Layer) of OSGi is kind of vague about it. But the sentence "The OSGi Modules layer adds private classes for a module as well as controlled linking between modules" implies to me that a module is a bundle (bundles have private packages but packages don't have private classes, linking to packages is not supported because there is no package manifest).

Simon Chemouil said...

Hi Eike,

Thanks for your reply :).

Actually I think we agree on (1): we both want tools that have sensible defaults so that developers can focus on writing their business code that works rather than spending their precious time learning the subtleties of the OSGi framework. My point is that until we're there, we developers -- or at least one developer per project -- have/has to step up to learn and do those things, and those versionned packages can be consumed. Even with PDE, and even if it's a hassle, it's possible to manually manage the versions of exported packages. It's unfortunately something that is rarely done in Eclipse projects and very difficult to change now for backwards compatibility reasons. That takes more time than the require-bundle equivalent but it plays much nicer in other OSGi containers.

In any case, changing old habits is hard, and even with semantic versioning tools, it will take time, to update APIs with proper tags, and so that enough people know about it. Probably years to propagate. So I guess what I was trying to advocate clumsily is that in the mean time PDE users should do it manually even if it's painful. And since most people don't know about it, and often don't _want_ to know about it, we should talk about it ;)

About (2), I was not specific enough, I said package but I'm talking about public packages. After all, in private packages anything goes, they should never be imported.

Then, it all boils down to what a "module" is. Of course, a bundle is a "kind" of module: it's the module in OSGi at runtime (i.e, deployment unit). However, at a static level (compile time) -- and the most relevant when thinking of manifests --, it's the package.

Packages do have some metadata associated with them, even if it's not *attached* to them. The first meta-data is their name, but tools like Bnd that manages exports for you will also look for a packageinfo file in public packages to find their version. That's not standardized since it's used in the end to specify the version of the export, in the manifest. Bnd will also look for a @Version annotation in package-info.java. But truly it doesn't matter where the metadata is as long as the framework can find it. I'm saying that the package is the module in OSGi because in an environment that only uses Import-Packages rather than Require-Bundle, I can take any *public* package and move it to another bundle, regenerate the previous and new containing bundles' manifests, and it will still run perfectly without breaking clients (this is very useful when refactoring). If we take the definition that the module is an autonomous unit, then the public package is -- even if the associated metadata is the containing bundle for practical reasons.

I'm a bit feverish these days so I hope I make sense, it's much better explained on Peter Kriens' blog : http://www.osgi.org/blog/2011/05/unbearable-lightness-of-jigsaw.html and related posts

Philippe said...

Please say no to JavaDoc tags. Java 5 was released 7 years ago, please embrace it and use annotations. That's one of the most annoying things about the OSGi APIs that you're stuck with Java 1.4. That feels IBM every time you have to use them.

The only API that's old school like this (and feels equally dusty) is the Maven API.

Neil Bartlett said...

@Philippe you don't seem to have noticed that OSGi has adopted annotations in the core API now.

BTW I'm really curious why you associated the non-generic API with IBM. The real reason is embedded developers... Java 5 may be seven years old on desktops and servers, but it barely exists even today in the embedded and mobile worlds. OSGi is caught between the rock of enterprise developers who want to use the latest language features, and the hard place of embedded developers who have no choice but to use Java 1.4.

It's certainly not because OSGi developers are lazy or that we really love using Dictionaries, Enumerations and raw types!

Peter Kriens said...

bnd has ConsumerType and ProviderType annotations for the discussed purpose. Maybe it is time to see if we can provide an annotation from the OSGi as well.

About module. Module has not a single definition, it is a process. A function is a module, a Modula module is a module, a class is a module. A module is an encapsulation with imports/exports.

In OSGi the JAR is the deployment unit and it exports/imports packages.

You might want to look at TSS: http://www.theserverside.com/feature/Successful-modularity-depends-on-your-dependency-model