Software Simplexity

Tuesday, February 12, 2013

Yesterday Neil Bartlett released bndtools 2.0, a major piece of work by Neil and others. He worked so hard on it that he even stopped harassing me in the morning! Kidding aside, even though I am part of the time I am often surprised to find very useful functions that I was totally unaware of. Now, if you looked at bndtools some time ago, take another look since it really matured. I am convinced that it is by far the best tool to develop OSGi bundles in the market. This new release adds a lot of new features, some of them I'd like to point out in this blog. Today I am starting with release management.

For me, by far the most interesting new feature is the release tool. I hate versions with a vengeance, they are not very user friendly. In a large build it is so easy to get versions wrong and the resulting problems are not easy to detect, to say the least. Automating is the solution of course; the original reason behind bnd. However, bnd works on the bundle and class path level and therefore has no concept of change, and versions are all about change. Semantic versioning provides a language to express the nature of the change and thereby gives us the mechanism to describe a conditional dependency that accepts certain changes but rejects other changes. As the white paper argues, this dependency differs on the role the requirer plays. Consumers are loosely coupled to an API since we go out of our way to be backward compatible but Providers of that API have a much stronger connection, almost any change in the API is a responsibility of the provider. These different roles and their influence on imports is reflected in their import version policies.

The theory is fine but it requires very minute maintenance of versions. For a lot of developers the idea to maintain package versions sounds horrific because it is already so difficult to manage their artifact versions, just imagine that you have to do that 20x more! In reality, bnd has always provided support for minimizing that work. You rarely ever have to specify import versions since they're picked up from the classpath.

The story for exports is different. You have to specify the export versions by hand, and this is trickier because it requires a judgement of the changes that were made. For example, adding a method to an interface breaks all implementers of that interface. Depending if that interface was implemented by a Consumer or Provider, you have a major or minor change. Judging this change requires a comparison against a previous version and as I stated earlier, bnd had no concept of time. It had, however, a concept of (pluggable) repositories, and repositories can provide history. Though all parts were present, there was only rudimentary functionality that bound it all together before release 2.0.

There was a release tool in the previous release but it had some shortcomings. PK Søreide from CommActivity in Stockholm and I set out to improve this function for this release. I extended bndlib with an extensive API and resource diff tool and a baseline tool. The diff tool understands Java semantics and is capable of representing the changes between two JARs all the way up to the modifiers of a field or method. It also judges if a change is MAJOR, MINOR, or MICRO: the parts of our version. Since this judgement depends on the role that an interface or class plays, the diff tool supports the @ConsumerType and @ProviderType annotations. The diff tool also supports many other, sometimes subtle, rules about compatibility like adding methods to an interface or making a field protected.

Now, command lines are my favorite tool but, although they seem to be gaining in popularity, most developers want buttons and lists. So PK developed a GUI tool that displays the aggregates of the changes with the possibility to drill down to the most minute details. It also provides a suggestion for the bundle version and each modified package version. With one press of a button, the build is updated and the artifacts are pushed tot the release repository.

There is also a global command for releasing all the bundles in a workspace in one go. With one press, you can now release all your bundles to the Release repository. Since we also added a new File Repository that is easy to connect to ftp, git or some other deploy tool you can automatically make it available to team members or the world. However, that is for a next blog.

Thursday, February 7, 2013

Today a local software developer and I enjoyed one of those wonderful French lunches. We had met a few times over the phone because we have a shared interest in the local Internet fibre, paid for by the county, but that seems hardly used. Before the lunch I visited his office to have a chat. where it immediately became clear that Java was a 4-letter word in the office, despite being an IBM shop. The story they told me was quite tragic and I am afraid that thousands, maybe even hundreds of thousand shops can retell this same story at the detriment of the Java community.

At the beginning of this millenium Java was heavily promoted by IBM since this language could unify their amazing breadth in incompatible platforms. So they send their (COBOL and RPG) developers to IBM for training. After 4 weeks they returned, no longer being to walk with their heads up, and not having a clue where to start with a Websphere application. Then, when they finally got something to run it actually crawled. It did not take much time before they got so disgusted that they threw it all out and happily moved to PHP ...

Yes, Java is big at big corporations (a not unimportant market) but what does it say when developers run for the door when it is their own money? Actually, my lunch partner confirmed what I always felt: Enterprise Java is making simple things really hard and as a result we loose a large number of potential Java developers. Working on an enterprise like application for the last year I actually find Java quite perfect for the task. After working with Javascript and Coffeescript over the past year I know using Java is a lot less frustrating. Not that the language is superior, there are many nicer languages, it is the Eclipse JDT with Java combination that is far superior to anything else I know. Scala might be nicer as a language but last time I looked their plugin it was not even close. Loved Xtend as a language but using it sucked 0without the brilliance of JDT. Pretty sure Ceylon has superior language features but confident that their plugin does not come close to JDT without even having to look. The lack of support for refactoring makes using any language a pain in comparison and Java is quite lonely at the top due to JDT.

My lunch partner never even heard of refactoring, he had disengaged long before those incredibly powerful tools became opportune. So where is the gap? Let us take a look at some of the issues.

Object/Relational mismatch. Much of what enterprise developers do is shuffling around information in databases so you would think this should be easy to do in an Enterprise language. However, since all we had was objects Java crashed right into relational database technology. It is almost awesome to see what lengths our industry is willing to go (think Hibernate, JPA) to cover this fundamental impedance mismatch. What you can do with a few lines of PHP takes XML, Java, Annotations, and more in our world. Looking at many frameworks one can only be awed for the amount of complexity we are willing to add for ease of use (and fail at it).

Develop Cycle. I f there is one thing I learned in the last year it is that I spent most of my time figuring out how things really work out there. Documentation is often lacking and the only way to figure out what works and what isn't is trying, looking at the result, and making corrections. Turnaround time in PHP and other dynamic languages is negligible since you just change the file on disk. We have a fundamentally more lengthy edit/compile/debug cycle. Any second between saving your code and being able to view the result is just plain unpleasant. Even though we have in process code replacement nowadays, it is still not as fluid (although bndtools is nowadays as close as it gets).

Custodians mismatch. Over the last years I've worked with many system developers. Many of them really good people, bright, hard working, and sincere. However, few of them have actually developed real world applications, they invariably have worked on the software vendors site. More often than not hired from University. And still, these people are the trend setters, write the specifications, and are more or less the custodians of Java. Just something to ponder: how come there is no money type in Java? Enterprise software is very much about shuffling money; readers that ever had to write this kind of code know money calculations are not straightforward and map neither well to int, long, double, or float and calculating with BigDecimal is a daft. The absence of such a type illustrates the discrepancy between the custodians of the language and their users.

Modules. All popular web languages (Perl, PHP, Python, Ruby, etc) have a module system, CPAN being the father of them all. We utterly lack something in Java. Yes, Maven central is a great collection but using a maven artifact is for many reasons a lot more complicated than using PHP Pear or NPM. Not only should it be a lot easier to reuse components, it should also be really easy to modify them to suit our needs. Though there a multitude of open source projects, cloning is expensive since it is often not cohesive and usually quite large. There are very few highly cohesive uncoupled enterprise components ready for reuse out there.

Concluding, the reason my lunch partner had not discovered the joys of refactoring is because he could not start simple. I also learned this when I wanted to teach Java to my children or when I wanted to build a simple website. There is no Java alternative to Ruby on Rails or just plain old PHP. And actually this frustrates me to no end since I do believe we have by far the best technology. Despite this advantages, we seem to be unable to provide a simple entry point to our temple for the rest of the world. Sadly, in our quest for backward compatibility we actually tend to make it harder and harder for newcomers, something that does not bode well for our industry.

Tuesday, January 8, 2013

We all know and cherish the "Hello World" examples, a tradition like cookies and apple pie. Well, metaphorically for our industry. Though I usually love a good "Hello World" examples, and have made many myself, I am starting to wonder if in today's complex world a 5 minute example can not only cause confusion, it can also add extra bagage to frameworks. I recently had such an experience.

For my system I decided to built the user interface for 100% in Javascript and HTML 5. Browsers have grown up, machines today are fast, and a local UI can be much more responsive than a remote one. Since this is a relatively new area it was time to evaluate. I tried many but when I saw the the Angular Todo example I was stunned. So little code, so clear. And it did work very well with the first examples. Over time my code got more and more tangled and one day I spent some time looking at bigger applications and found that the Todo list example had set me on the wrong course. Angular actually has terrific support for modularizing applications and a very good URI routing model for single-page apps. However, this approach was significantly different from the Todo and other hello world like examples. It feels like framework developers tend add features to their code to make it look simpler than it really is.

I am not blaming the Angular developers, on the contrary, it is a solid product and after I figured out how to build my app it scaled well. They are not the only one, I've seen this pattern now in more places and looking back at my OSGi days I guess I've done similar things.

I guess human nature is such that honest framework developers are bypassed for the ones that lure the developer with simple examples. I guess we just get what we deserve.

Monday, December 31, 2012

For my project, which I am hoping to share more about soon, I am
having a full copy of Maven Central and some other repositories. Since
the work I do is related to dependencies I have an list of artifacts in
ranking order. I based this ranking on the popularity (number of
transitive inbound dependencies) and its weight. Dependencies are
calculated per program, using its latest version. There were almost
40.000 programs in the database. This is not an exact science, some
heuristics were used. However, having a top ten to close 2012 sounds
interesting.

#1 Hamcrest Core — Never heard of it before. It turns out that this is a library that adds matchers
to Junit, making test assertions more readable. Its (for me unexpected)
popularity is likely caused by JUnit that depends on it (actually
embeds it). The number of inbound dependencies is almost equal (27772
versus 27842 for Hamcrest).

#2 JUnit — is
a regression testing framework written by Erich Gamma and Kent Beck.
It is used by the developer who implements unit tests in Java.
It has more than 10000 direct dependent projects and is likely the most
dependent upon project.

#3 JavaBeans(TM) Activation Framework —
The JavaBeans(TM) Activation Framework is used by the JavaMail(TM) API
to manage MIME data. It is, for me, a perfect example of a library that
was over designed during the initial excitement of Java. It has a
complete command framework but I doubt it is used anywhere. However, the
Javamail library did provide a useful abstraction and it depended on
the activation framework.

#4 JavaMail API — The
illustrious Java Mail library, developed before there even was a Java
Community Process. Provides functionality to mail text from Java (which
few people seem to know can also be done with the URL class, but that is
another story). Still actively maintained since the artifact was
updated less than 10 months ago.

#5 Genesis Configuration :: Logging —
Provides the common logging configuration used by the build process,
primarily used to collect test output into 'target/test.log'.
Surprisingly, it has over 20.000 transitive inbound dependencies. Likely caused by the fact that it looks like every Geronimo project depends on it.

#6 oro — I
remember using Oro somewhere south of 1999, it was a regular expression
library since Java before 1.4 did not support regular expressions. It
turns out that Oro was retired 7 years ago and should not be used
anymore. Still it has also over 20.000 dependencies. At first sight,
many Apache projects still seem to depend on it even though it
recommends that the Java regular expressions should be used.

#7 XML Commons External Components XML APIs —
xml-commons provides an Apache-hosted set of DOM, SAX, and
JAXP interfaces for use in other xml-based projects. Our hope is
that we
can standardize on both a common version and packaging scheme for
these XML standards interfaces to make the lifes of both developers
and users easier. The External Components portion of xml-commons
contains
interfaces that are defined by external standards organizations. Has
not been updated for 7 years (I guess XML's heydays are over by now).
#8 OpenEJB :: Dependencies :: JavaEE API — An
open source, modular, configurable and extendable EJB Container System
and EJB Server. The popularity of this library is likely caused by the
fact that it has an inbound dependency of log4j.

#9 & #10 mockobjects:mockobjects-core — A library to make mock objects. It is over 8 years ago when it was updated but it still has more than 20.000
inbound dependencies.

#11 org.apache.geronimo.specs:geronimo-jms_1.1_spec — Provides a clean room version of the JMS specification.
Since this ended so surprisingly high up I look where its popularity
came from. It turns out that again here log4j is the culprit.

#12 Apache Log4j
— Which brings us to the artifact that is pushing all
these previous artifacts to greater heights than they deserve. log4j is
directly referenced by a very large number of projects. The following
image shows its dependency tree:

Why a log library should depend on the Java EE API is a bit of a puzzle. Anyway, happy 2013!
Peter Kriens

Monday, December 17, 2012

A meteorite likely caused the demise of dinosaurs; since that time we tend to use the term dinosaurs for people that are too set in their ways to see what is coming. Though an awful lot of practitioners still feel Java is the new kid on the block we must realize that the languages is in its mid life after 20 years of heavy use. The young and angry spirits that fought the battle to use Java over C++ have long ended up in the manager's seat. Java today has become the incumbent so, can we can keep on grazing the green and lush fields and not having to worry about any meteorites coming our direction?

In 1996 Applets were the driving force behind Java in the browser. They were supposed to bring the programmability to the browser in an attempt to kill of Microsoft's dominance on the desktop. While applets got totally messed up by Sun due to a complete lack of understanding of the use case (they did it again with Web Start), Java's silly little brother Javascript grew up and has recently become an exciting platform for UI applications. With the advent of the Web Hypertext Application Technology Working Group (WHATWG) that specified HTML 5 we finally have a desktop environment that achieves the dream of very portable code with an unbelievable graphic environment for a large range of devices.

"Great", you think "we support HTML5 and Javascript from our web frameworks. So what's the problem?" Well, the problem (for Java at least) is that AJAX now has grown up and is calls itself JSON. Basically, all those fancy Java web frameworks lost there reason of existence. The consequence of a grown up programming environment in the browser is that the server architecture must adapt or go in extintction. Adapt in a very fundamental way.

One of the primary tenets of our industry is encapsulation. Best practice is to hide your internal data and do access through get/set methods. On top of these objects we design elaborate APIs to modify those objects. As long as we remain in a single process things actually work amazingly well as the success of object oriented technology demonstrates. However, once the objects escape to other processes, the advantages are less clear. Anybody that has worked with object relational mapping (JPA, Hibernate, etc) or communication architectures knows the pain of ensuring that the receiver properly understand these "private" instance fields.You might have a chance in a homogenous system under central control but in an Internet world such systems are rare and will be rarer. Unfortunately, clinging to object oriented technologies has given us APIs that work very badly in large scale distributed systems.

The first time I became aware of this problem was with Java security in 1997. The security model of Java is very object oriented, hiding the semantics of the security grant behind a user defined method call (implies). Though very powerful its cost is very high. Not only is it impossible to optimize (the method call is not required to answer the same answer under the same conditions), it is also virtually impossible to provide the user interface with this authorization information. Though a browser based program cannot be trusted to enforce security, the authorization information is crucial to make good user interfaces. Few things are more annoying than being able to push a button and then be told you're not allowed to push that button. Such an unauthorized button should obviously not have been visible in the first place. Remote procedure calls for such fine grained authorization calls are neither feasible nor desirable from a scalability point of view.

Another, more recent problem is the JSR 303 data validation API. This specification uses a very clever technique to create elaborate validation schemes. Incredibly powerful but due to reliance on inheritance and annotations. When the UI is built in the server, this provides a neat tool but when the UI is executed remotely you are stuck with a lot of obtuse information that is impossible to transfer to the browser where the user can be guided in providing the right input. Simple regular expressions might not be nearly as powerful but are trivial to share between browser and server.

The last example is just plain API design. Most of the APIs I've designed heavily rely on object references. The reference works fine in the same VM but has no meaning outside this VM. Once you go to a distributed model you need to have object identities that can travel between processes. Anybody that needs to provide an API to MBeans knows how painful it is to create a distributed API on top of a pure object oriented API. It requires a lot of mapping and caching code for no obvious purpose. A few weeks ago I tried to use the OSGi User Admin but found myself having to do this kind of busy-work over and over again. In the end I designed a completely new API (and implementation) that assumes that today many Java APIs must be useful in a distributed environments.

To prevent Java from becoming obsolete we must therefore rethink the way we design APIs. For many applications today the norm is being a service in a network of peers, where even the browser is becoming one of the peers. Every access to this service is a remote procedure. Despite the unbelievable increase in network speed a remote procedure call will always be slower than a local call, not to mention the difference in reliability. APIs must therefore be designed to minimize roundtrips and data transfers. Instead of optimizing for local programs I think it is time to start thinking global so we can avoid this up-coming meteorite called HTML5.

Friday, August 24, 2012

A version (like 1.2.3) is a remarkably ill defined concept, Wikipedia does not even have a lemma for it and several readers will remember the extensive discussions about version syntax between OSGi and Sun. For an industry where versioning (a word the spelling checker flags) is at the root of their business this comes a bit as a shock. This article tries to define the concept of a version and to propose a versioning model for software artifacts with the idea to start a discussion.

A software version is a promise that a program will change in the future, the version is a discriminator between these different revisions of the same program. A program is conceptual, it represents source code, the design documents, the people, ideas, etc. A program is intended to be used as a library for other software developers or as an application. If we talk about the Apache commons project then it maintains multiple programs, like for example commons-lang. A revision is a reified (made concrete) representation of a program, for example a JAR file. The version of a revision discriminates the revision between all other revisions that exist and promises that this will also be true for future revisions.

This last requirement would be easy to fulfill with a unique identifier, for example a sufficiently large digest (e.g. SHA-1) of some identifying file in/of the revision. This was actually the approach taken in .NET. However, the clients of the program will receive multiple revisions over time; they will not only need to discriminate between the revisions (digests work fine for that), they will in general need to make decisions about compatibility.

The most common model for deciding compatibility is to make the version identifier comparable. The assumption behind is that if version a is higher than version b then a can substitute for b; higher versions are backward compatible with earlier versions. In this model, an integer would suffice. However, a version is a message to the future, it is a promise to evolve the program in time withe multiple revisions.

Since the version is such a handy little place to describe an artifact, versions over time were heavily abused to carry more information than just this single integer. Tiny Domain Specific Language (DSL) were developed to convey commitments to future users. As usual, the domain specific language ended as a developer specific language. Versions are especially abused when they convey variations of the same revision. For example, an artifact for Java 1.4 and the same source code compiled for Java 7. These are not versions, but variations, another dimension.

The lack of de-jura or de-facto standard for versioning made relying on the implicit promises in versions hard and haphazard. Worse of all, it makes it impossible to develop tools that take the chores of maintaining versions out of our hands.

A few years ago a movement in the industry coined semantic versioning. At about the same time the OSGi came out with the Semantic Versioning whitepaper that was based on some identical and some very similar ideas. Basically, these are attempts to standardize the version DSL so tools can take over the versioning chores. Tools are important because versioning is hard and humans are really bad in it. And with the exponentially increasing number of dependencies we are going to loose without tools.

minor - Signals backward compatibility for clients of an API, however, it breaks providers of this API.

micro - Bug fix, fully compatible, also called patch.

qualifier - Build identifier.

In general the industry is largely consensus on the first three parts, the contention is in the qualifier. Since the qualifier has only one requirement, being comparable and, in contrast with the first three parts, can hold more than digits, it is the outlet for developer creativity. Long date strings, appended with git SHA digests, internal numbers, etc.

The qualifier's flexibility made a perfect candidate to signal the phase change any revision has to go through in its existence. The phase of a revision indicates where it fits in the development life cycle: developers sharing revisions because they work closely together, release to quality assurance for testing, approval from management to make it public, retiring of a revision because it is superseded by a newer revision, and in rare cases withdrawn when it contains serious bugs. The qualifier became the discriminator to signal some of these phases: qualifiers like BETA1, RC8, RELEASE, FINAL etc.

Using the qualifier to signal a phase implies a change after the revision has been quality assured, which implies a complete retest since changing a version can affect resolution processes, which can affect the test results. It also suffers from the qualifiers that invariably pop up with in this model called REALLYFINAL and REALLYREALLYFINAL, and PLEASEGODLETHISBETHEFINALONE. Also, this model does not allow a revision to retire or be withdrawn since the revision is out there, digested, and unmodifiable.

It should therefore be clear that the phase of a revision can logically not be part of that revision. The phase should therefore be maintained in the repository.The process I see is as follows.

It all starts with a plan to build a new revision, lets say Foo-1.2.3 that is a new version of Foo-1.2.2. Since Foo-1.2.3 is a new version, the repository allows the developers to (logically) overwrite the previous revisions. That is, requesting Foo-1.2.3 from the repository returns the latest built, e.g. Foo-1.2.3-201208231010. As soon as possible the revisions should be built as if they are the final released revision.

At a certain point in time the revisions need to be approved by Quality Assurance (QA). The development group then changes the phase of the revisions to be tested to testing or approval. This effectively locks the revision in the repository, it is no longer possible to put new revisions with the same major, minor, micro parts. If QA approves the actual revisions then the phase is set to master, otherwise, it is set back to staging so the development group can continue to built. After a revision is in the master phase it becomes available to "outsiders", before this moment, only a selected group had visibility to the revision depending on repository policies.

If the revision is valid then existing projects that depend on that revision should never have to change unless they decide to use new functionality that is not available in their current dependency. However, repositories grow over time. Currently maven central is more than 500 Gb, contains over 4 million files, more than 40.000 programs and a staggering 350.000 revisions. Most of these revisions have been replaced by later revisions, still a new user is confronted with all this information. It is clear that we need to archive revisions when they are superseded. Archiving must hide the revision for new users, it must still be available for existing users of the artifact.

Last but not least, it is also necessary to expire a revision in exceptional cases when the revision causes more harm, for example a significant security bug, than the resulting build failure.

Summary of the phases:

staging - Available as major.minor.micro without qualifier, not visible in searches

candidate - Can no longer be overwritten but is not searchable. Can potentially move back to staging but in general will move to master.

master - Becomes available for searches and is ok to rely on.

retired - Should no longer be used for new projects but is available for existing references

Thursday, July 19, 2012

I am having so much fun developing a system from scratch that it is hard to take time away for writing blogs. However, they say I need to do this otherwise I'd be forgotten ... Anyway the reason for this blog is XRay. XRay is a plugin for the Apache Webconsole that provides you with a quick overview of the health of your system. Since I first wrote about it I've added some features because I am using it all the time, this is the best OSGi tool I've ever made. I have the XRay window on my screen all the time. When I make changes in bndtools I see the screen move on my laptop screen in the corner of my eye. When things go wrong, the colors change and you know you have to take a deeper look. It is kind of amazing how much the inner bowels of the framework are alive. The tools has spent me countless hours because the number of wild goose chases are far fewer.

One of the major new features is that it now shows when a bundle is not refreshed. For some reason, one of my bundles sometimes escapes the refresh cycle after a bundle is updated (still have to figure out why). This caused that some services were not wired up (showing up as white services, dashed when requested but not found, see picture) which should have been connected.

When I figured out this was caused by the refresh, I just created a red dotted border around the bundle. Now the symptom is highly visible. Those poor lazily activated Eclipse bundles that are in the starting state (now, that was a bad idea ...) are now also color coded with a slightly lighter touch of orange.

The Javascript libs are now included so it can be used in
environments where there is no internet. If you use service based
programming, check it out! It saves an amazing amount of time.

Some people reported layout problems because they had hundreds of bundles and few services. Though I do not have a lot of time for this, I improved the layout algorithm. Still could use some improvements. The window also became bigger and better scrollable. It probably still won't work for hundreds of bundles but I am having about 30 bundles and it works surprisingly well.