Thursday, November 22, 2007

How Many JSRs does it Take to Reinvent the OSGi Framework?

How many JSRs does it take to implement the OSGi Framework? Well, we are currently at 6 and counting with JSR 320. I guess we are in a spot everybody wants to make his own and the JCP does not seem to have any build in controls to keep the process straight.

In an ideal world, the JCP would have a single architecture board that would look at the interest of Java and its users from an overall perspective. In the real world, the structure of JCP is geared to the creation of unrelated, not always well specified, standards. What we currently have is a hodgepodge driven by private interests that is eroding the value of Java.

The argument to increase the mess with JSR 320 is to make it "light weight". This is an easy argument to make and in the JCP you easily get away with there is no requirements process. For a JSR you fill in a rather simple questionnaire, and that is about it.

What does light weight really mean? Concierge is an 80Kb implementation of the OSGi R3 Service Platform, is that light weight? I do not know, it depends on where it is going to be used, and more important what functions need to be added in the future to handle the real world use cases. Over the last 30 years I too often fell in the trap to develop something that already existed but I thought I could do it simpler. Works perfectly until it gets used by others and you usually quickly find out some reasons for the original complexity. For this reason, the OSGi Alliance divided the specification process in 3 steps of which the first step is a requirements document: the Request For Proposal or RFP.

The RFP template consist of a section that describes the application domain, a concise description of the problem, a use case section, and a requirements section. Interestingly, it is always surprisingly hard to keep people honest in this document. Most authors already have a solution in their mind and find it terribly hard to write down these sections without talking about their solution. And I can assure you, it is hard. It is so much easier to just require a technical solution than explain the underlying forces.

However, it turns out that it is a lot easier to discuss requirements with the other stakeholders than to discuss ad-hoc technical solutions. During these discussions, all parties learn a lot and most interestingly, tend to converge on the key requirements quite quickly. Interestingly, often the initiator learns a lot about his own solution as well.

Once the requirements are better understood, the best technical solutions can be much more easily compared which prevents many political discussions: our solution is better than yours. It is hard to underestimate what this does for the mood and efficiency of the expert groups.

The result of this more careful standardization process is a more cohesive and complete architecture than one finds in the JCP.

When the JCP would work more from requirements, so many problems could be prevented. The JSR 277 expert group would likely have found out that the OSGi R4 Service Platform satisfied most of their requirements before they invested in a developing their own solution. JSR 320 is a typical case. Where is the requirements document that I could look at and write a proposal for based on existing OSGi technology? Such a document does not exist. In a years time with the public review the solution will be so ingrained that fundamental changes are not possible.

JCP is a sad variation on the tragedy of the commons. Java is a common area that we share in a large community. The better we take care of the commons, the more people will join this community and the more we can prosper. However, the land grab process of the JCP is slowly destroying the value of the commons because it creates a hodgepodge that is harder and harder to navigate for its users, diminishing the value for all of us. How can we change this process before it is too late?

source code has to be the primary artifact for a project, binaries the secondary

Everything else follows pretty obviously from that. So far, none of the systems I've seen manages to get that simple thing, as they are all designed with accommodating proprietary software vendors that want to push around binary blobs over the wire, and prefer to have legal mine fields around source code access.

The Free Software C/C++ world has done so well on GNU/Linux wrt to modularity because they understood that simple fact, that source code trumps binary for distributors, developers, etc. so they use build systems like Automake that make it a child's play to generate the rebuildable source code for an artifact.

From a technical perspective there is no difference between source code and bytecodes, they contain the same information. However, bytecodes have an enormous advantage over source code because it is a lot easier to process and the VM is extremely well specified. By having a well defined standard in the middle we allow VM implementors to carry our code to new devices, give CPU manufacturers unprecedented freedom, while we write new functionality that runs on all these environments. That this model sometimes fails is no reason not to strive for this model.

With the Java bytecode model we can handle systems with hundreds of thousands of devices, in many difference incarnations, that are centrally managed. As any C/C++ programmer can testify, trying to do this with source code quickly becomes a nightmare of conditional compilations. Actually, I was at a customer recently that told me some interesting horror stories about a very large code base where ifdefs have turned into a living nightmare.

I have very extensive experience from microprocessor assembly, P/LM, C, C++, up to Scala but I thank the dear lord everyday I left the hell of undefined integer lengths and core dumps :-)

Though I often also do not like licensing issues I do think there is a case to be made for commercial software. I fail to see how some companies can prosper in the long run. For example, when I look at Eclipse I see a commercial model that is turning out very high quality software quite consistently. In contrast, many open software projects are gems but there is a whole load of bad, unfinished, rubbish out there as well.

Anyway, my philosophy is to allow as many people to scratch their itch as possible. If there is one thing I learned in the past 30 years then it is that one should be very careful to make assumptions about other people's problems. I see the cases where the GNU model works, I also see cases where open source fell apart.

I wonder if you have tried OSGi? Most programmers fall in love once they see how the OSGi provides a component model that is still present in run time. It is always so much fun to look at the eyes of people doing a tutorial when they use the shell to see the components and realize that they can dynamically update the components. Try it!

The fundamental problem of most C software is shared with most Java software: its developers are writing to an implementation, rather than to the specification. As a quick run with FindBugs over any major code base written in Java shows, they are all pretty buggy, just in a different way than most C code is (splint is great for convincing oneself of that). The sad fact of software development is that most software is not perfect, and will never be, as that's not economically feasible. Regardless of the programming language, even. :)

Requiring source code as the primary artifact does not imply that the user needs to rebuild the module in order to use it. She can happily use the corresponding binary artifact(s) and never have to touch the source code if she's lucky. But if she needs to rebuild the binary artifact, for example because she's running an earlier version of the Java platform, than the person who built the artifact, she's out of luck with a modules system that does not let her do that.

Let me smash the bytecode is all you'll ever need myth. :) The Java class file format is by design not upwards compatible, so one needs to build artifacts to the most general API/class file format version to make them generally usable. No Java source code compiler I've seen so far actually supports saying --compile-against-lowest-java-API-release-satisfying-the-binary-compatibility-constraints-and-the-corresponding-class-file-format-releasewhich is necessary to make a binary only module system work for that scenario without potentially requiring users to recompile for their own platform.

That does not even begin to address different deployment needs. For example, I'd like to pack my modules using Pack200 to reduce their size, as I know that my deployment platform supports Pack200, as well as use JAR indexing to speed up class loading, and of course use the stack map attributes in my bytecode to speed up verification. If a module is compiled for the lowest common denominator, I won't get any of the benefits of using a more capable version of the JVM than the one the module was compiled for. If I am denied access to the source code, and the opportunity to rebuild the module easily by the module system, I am denied the significant benefits provided by my existing technology investment for no good reason.

I.e. a binary only system is wasting my money, because it assumes that upstream developers can get it 100% right the first time, and all the time, while the reality is quite different.

I like OSGi, the idea is cool, and the implementation I played with (knopflerfish) seemed nice. It works well for what it's designed to do, afaict.

But there is a lot to learn for OSGi as well as JSR 277 from the experiences of packagers who deliver modular applications, libraries and stacks to millions of users across platforms ranging from embedded devices, to enterprise systems running my favorite search engine. And their experience is that source code is fundamentally important.

What I miss most in the Java modularity discussion is a honest, open look at the limitations of one's own technology, and a perspective on how to deal with it, in light of the failure of a single technology in the past 10 years to take over the Java world by storm. I've looked at JARs, Maven, OSGi, 277, etc. and they all miss the fundamental ingredient that successful, one stop shop module systems for other languages like Ruby Gems, Perl CPAN, or Python packages: they all use the source code as the fundamental unit in the repositories. Never mind that that's also true of all the successful GNU/Linux distributions, the BSDs, etc.

For some interesting material on the topic, see http://www.edos-project.org/xwiki/bin/download/Main/Deliverables/edos%2Dwp2d2.pdf

1. For perspective. The number of binary deployments dwarfs the number of source deployments by many magnitudes.2. Maybe binary distribution misses opportunities in certain cases but that trade off is a proprietary trade off. I think companies should have that choice. Forcing people to hand over source code will also limit opportunities for some companies. Source code distribution is always a possibility but I still think the holy grail is binary distribution.3. Java bytecodes contain the identical information as the source code. It is not that hard to write up and down converters.4. All Java compilers support target version, source version, and can be linked to the appropriate runtime libraries. Current situation is a mess due to the profiles/configurations but the OSGi EE tries to correct it.5. Interesting you address Ruby, Gems, Perl, and Python. A friend of mine is a Python fanatic but is thinking of moving to Java because he sees that the number of professional commercial libraries in Java is much larger than in Python.

I'm curious what numbers you have to back up the claim about deployment of binary-only module systems dwarfing source-too ones by many magnitudes. Which particular systems do you have in mind, and where can I get the numbers?

Did you take your time to read the paper linked above?

I'm also curious which Java source code construct you believe to be identical to the goto & goto_w bytecodes. No Java language specification has a goto statement, actually.

It is pretty easy to generate legal bytecode that has no legal corresponding Java source code (see old Java security attacks from 1996 or any good paper on bytecode obfuscators).

Given that an increasing chunk of bytecode running on a JVM does not come from javac, but has been processed through AOP tools, or compilers for other languages, the idea that there is a 1:1 correspondence between bytecode and source code does not really work, unless one's model is limited to only ever dealing with Java bytecode that has corresponding Java source code. The Java platform is a bit bigger than that, though. (groovy, aspectj, jruby, nestedvm, ...)

Well, mobile phones dwarf anything else and they clearly receive binary Java distributions. Any embedded devices downloads binaries. Maven, despite its problems, is quite successful because it allows people to work with the binaries. I never build from source for my PC applications, need I go on?

You do not have to use Java as the source for a java program. Why do I need the source? The bytecode allows me to weave whatever I want to change (which I prefer not to, and rarely if ever need).

Time will tell. Despite its imperfections, Java did an amazing job in standardizing the runtime and I feel that is the way to go so we can evolve hardware and software more independently. If you feel source is the way to go, lets agree to differ.