Friday, March 4, 2011

Modularity is Hard, lets do a Jigsaw?

Reply to an interesting blog at DZone from Martijn Verburg about modularity, OSGi, and Jigsaw. The blog was about the eternal discussion of using the package as the atom of dependency or the JAR. So one more time.

After putting up with the argument of package versus JAR as dependency point I've noticed a pattern in people. Initially the proponents of the Jigsaw/maven model of JAR dependencies can only indignantly come up with one argument: it is simpler. We actually added Require-Bundle under protest for Eclipse, which actually models most of the Jigsaw/maven view of the world. However, once people start to use OSGi they discover that the model is not simple, it is simplistic. Yes, if you make a "hello world" Require Bundle might be simpler, but when you have to maintain a 400.000 line code base over many years you quickly discover why it is not simple but simplistic. I've not met an experienced OSGi user that argues for Require-Bundle, and I am pretty sure that Eclipse would not have pressed for Require-Bundle if they had known what they know now.

Having been on the OSGi side for more than ten years I came to compare it to plumbing. On the OSGi side the plumbing is done well and we got rid of the smell but when I visit the other side I cannot be but amazed at how messy and smelly life is there, like plumbing in the medieaval times. Though there is a lot of amazing functionality out there, invariably when you analyze these wonderful applications and libraries you find hundreds to thousands of type references that are impossible to satisfy in run time. Ticking bombs waiting to hit you when they cause the most harm. Versioning is absent, it is a mess, or it is not very useful at all. Beautiful castles but built on quick sand.

The tragedy of OSGi is that the model puts a lot of these hidden problems up front in your face of letting them explode at the customer's (always wondered if this is the reason our industries invests so much in logging). Most open source libraries never worry about their dependencies and drag in transitively many megabytes of unneeded code. Hey, the only time you notice this is when maven starts downloading the Internet and that is only once due to caching. In OSGi you have to handle the dependencies up front, without OSGi you can put your head in the sand and "prove" that it works by running it for some time without a Class Not Found Exception. If people build bridges like the average WAR I probably would buy a boat.

So yes, OSGi has a threshold that is not easy to cross, but there is nothing tooling cannot do. Afterall, we have a compiler that handles the byte codes for us, in the same vein tools can remove most of the additional metadata for OSGi. Fortunately, the tooling is improving, take a look at bndtools, which is supposed to integrate with Apache Sigil soon. This is an Eclipse plugin that works very well inside Eclipse but also provides a small ant plugin that can build workspaces in a headless build, providing identical results inside and outside Eclipse. I also know IBM is investing heavily in OSGi tooling for Websphere. There is of course Eclipse PDE and a derivative of PDE with SpringSource but these tooling are not very oriented to package dependencies, hope they will change too.

So I challenge anyone to argue why the Jigsaw/maven's model is technically superior to OSGi's package model. So the question is: do we want to move the Java industry forward by picking the technically superior model or do we keep building on quicksand?

There is one issue with Require-Package. Because of "package" visibility in java, packages already had a meaning in the language. Adding another meaning to package makes it overloaded. There are times when a bundle is forced to put its classes in the same package for visibility, and even to export those classes, yet it doesn't imply that bundles which require the original package also require the package fragment in the latter.

@Thomas: You must not have read the JLS. In the JLS it is unequivocally stated that a package is a module. That is, the clear intention of Java was to treat a package as an atomic unit. This is exactly the meaning the OSGi has taken and which is clearly not only in the spirit of the Java language but also in the formal definition of the language.

The fact that (too) many developers did not read the JLS and decided that unit of a package was something they could freely ignore (despite the problems with class loaders it causes, visibility is enforced on a class loader basis) is a sad illustration of the messy use of Java by too many people ...

Another thing we added under protest were fragments, the OSGi tool that allows you to break through the package module boundaries :-(

Well, Alex, the word module only appears once so it should not be too hard to find:Chapter 7 describes the structure of a program, which is organized into packages similar to the modules of Modula. The members of a package are classes, interfaces, and subpackages.

The formal aspect is the fact that packages are first class modules because of package private access. Split packages are an artifact of the class loading model but are clearly not the intention of the language.

That text ("Chapter 7 describes...") appears in a non-normative introduction to the features of the language. It intends to suggest that Java has functionality for information hiding. It does not state that a package is a Modula module, nor does it relate features of Modula modules to Java packages. Use of the term "module" is purely illustrative.

Any attempt to read a specific meaning of "module" into the definition of a Java package is misguided. The term "module" is not part of the formal definition of the Java language.

I agree that "the clear intention of Java was to treat a package as an atomic unit", but this intention comes from the introduction to chapter 7, where modules are not mentioned.

Ah! It talks like a module (see JLS), it walks like a module (it provides a unique name scope like modules do), and it quacks like a module (it has its own accessibility protection), but it shall not be called a module ???

Any reason why we're not allowed to call it a module but must call it an "atomic unit"?

You can call a package whatever you like, but to berate people for not seeing that "In the JLS it is unequivocally stated that a package is a module" is a bit much - because the JLS doesn't state that.

All the JLS suggests, informally, is that a Java package might, somehow, be like a Modula module. Of course there are numerous features of Modula modules which do not carry over to Java packages, e.g. the difference between definition and implementation modules; the existence of module-level initialization and termination code; the fact that a compilation unit in Modula can declare multiple modules while a Java compilation unit can declare for one package.