Well, Hal is exactly right that the process model has huge robustness benefits over threads due to the memory and scheduler isolation enforced between processes by the OS and MMU. He is also correct that much robustness is lost when running many applications in a multi-threaded environment (like the JVM), creating operational behavioral dependency among applications that didn’t exist at design time.

But rather than bowing to the false architectural dichotomy between the Process Model and the Thread Model, I propose an architecture that uses each of them to their best advantage, using processes to separate applications, and threads to separate transactions within applications, and also proposing a number of improvements to the JVM that would make this hybrid model as cheap to run as the monolithic JVM model we currently had.

Before continuing, let’s imagine for a minute a JVM that has some special properties:

it has no transactional state, which most people would read as “share nothing”

the transactional state is cached locally, and the ACI (but not D) cache is shared across all processes via shared memory

a fully operational JVM can be launched from scratch, ready to process transactions, in, say, under a second

With such a JVM, it is not hard to imagine the proposed solution: the runtime architecture consist of one jar file (i.e., one application) per JVM, and that each JVM handles multiple simultaneous transactions in separate threads. An errant application could then be recycled by recycling the entire JVM around it. For greater robustness within a single application, multiple JVMs could run the same jar.

What you now have is a mini-cluster that provides the robustness of the Process Model with the multiprocessing and “forking” speed of the Thread Model.

In order to achieve this, the JVM would:

have to be “pickle-able,” by which I mean that you can run a VM, get it to a steady state ready to process transactions, and then pickle it to disk, just like a VMWare image.

support sharable read-only data, for all code and constant data

the ability to pass TCP endpoints around among processes, so that a dispatcher process can funnel transactions to the right JVM without a copy.

I know some of these things have been worked on in the past, but I wonder if some of the specialized JVM vendors who have a stake in enterprise software (Oracle/WebLogic/JRockit, Azul Systems, Sun, IBM) shouldn’t start looking at this problem again.

What I would like to see is an automatic solution to the problem, one that detects such a deadly embrace and chooses a victim to kill. Detecting the deadlock would have to be heuristic, in the sense of watching the locks to see how long they usually take, and considering only those locks which exceed the normal holding time.

In a distributed locking case, such as with Terracotta, the deadlock detector could System.exit(), let the other VM continue along, and the management system would automagically restart the victim VM. It wouldn’t prevent the problem from happening again in 10 seconds, but it might at least ring lots of bells so someone can come look at the problem, rather than having the whole cluster deadlocked. In the single VM case, we’d have to wait for a proper solution to Thread.stop(), which I also talked about yesterday.

The other question I have about deadlock detection is whether some of it can be done via static analysis, but this is not my area of expertise. An alternative would be to use AOP to instrument the locks. I’m sure someone has already done this.

There are, however, still a number of robustness issues with Java as it exists today:

hot code relaoding – this is an age-old robustness problem, especially for operational issues like rolling upgrades. and I think the Zero Turnaround guys may have a splendid general purpose solution in their Java Rebel product.

runaway thread healing – These are threads that go into an endless loop or permanent I/O block. We used to have the ability to set an ExecuteThread timeout in WLS, whereby a watchdog timer would kill any thread that didn’t complete within a configurable time period. But then Sun deprecated Thread.stop(), and suggested instead cooperative thread death using a state variable. This abdication of responsibility for robustness from the VM to user code is similar to the cooperative transaction manager timeout, about which my colleague Pete Holditch says as

There is no easy answer – there isn’t really a facility in J2SE or J2EE as they stand today to allow a thread to be safely and asynchronously terminated.

I’d like to see a permanent solution to this problem, even if it means implementing transactional memory in the JVM.

memory quotas – Another great way to test an application server’s robustness is to leak memory. Providing a quota system that limits (hopefully, heuristically) the ability for a component to allocate memory would prevent bad code from killing the whole server with an OOME.

deadlock management – Before you go hitting the “comment” button, note that I went through Distributed Lock Manager hell in VMS 25 years ago, so I know the pitfalls here. Nevertheless, Azul has done somegreat stuff (pdf) in this area, and I think its ripe for attention.

So rather than just complaining, here are some real-life problems that Java Platform, Infrastructure Edition could solve.

Share this:

Like this:

A colleague recently detailed the trials and tribulations of a piece of infrastructure he is building in Java, and it seems that after beating his head against the wall, he won, but not without a few scars.

The thing that I have always loved about Java is its productiveness and expressiveness. I could write scads of code that worked well and correctly very quickly.

But Hal has indirectly pointed out a failure of Java that I think has been true since since its earliest days when I started using it in mid-1995:

Java is a lousy language for building infrastructure.

All of the techniques that Java exposes to operate on software itself are complex, slow, or bloaty, including reflection, proxies, class loading, weak references, bytecode manipulation, and compiler APIs. Since many of the pieces of WebLogic software I wrote had to deal with other user’s users’ code (Servlets, plug-ins, etc.), I spent a fair bit of time arm wrestling with Java, either trying to find the right technique, implementing a workaround, doing code generation, or chasing performance problems. When I transitioned to the chief architect role, the WebLogic team members experienced many of the same problems while writing EJB and RMI in Java (Sun’s RMI was in C). I also think much of the excitement over AOP was a result of people being fed up with using Java for system programming; AOP cut across these problems by bypassing or augmenting the language itself. Hal’s post tells me that the problem is still not solved.

This is not to say that I don’t think Java is a good language for server-side programming – after all, WebLogic pioneered Java on the server – but rather that the focus on “enterprise edition” has been on server-side application code, rather than infrastructure code.

IBM had a variant of PL/I called PL/S that they used internally for writing writing system software. We have editions of Java for the client, for enterprise apps, for mobile, and for real-time. Perhaps we need an edition of Java for infrastructure software.

Share this:

Like this:

I’ve been arguing – again – that we need a SaaS universal messaging hub that can send and receive messages among a variety of different synchronous and asynchronous message services, such email, instant messaging, SMS, RSS, and Comet.

I say “again,” because the company I co-founded in 1999, Kenamea, was originally conceived as “big message switch in the cloud,” which, among other things, solved a problem that is finally being recognized: the enormous and unproductive polling that takes place in HTTP and Ajax.

At the time, nay-sayers claimed that HTTP servers would have no problem scaling with all the zillions of Ajax requests, and of course, they didn’t, not without tweaking. No lesser a publication than WebLogic Developer’s Journal [WEG], featured a “how-to” article, My friend, colleague, and Scala/Lift maven extraordinaire David Pollak even showed us how to do it “right,” using actors. So, yes, maybe the existing httpd’s scale, but not without serious architectural work. I suspect even more work will be forthcoming, as amount of Ajax traffic continues to increase.

The answer to these kinds of problems is to reverse the protocol: instead of web services, applications, and RIAs polling web sites and web services for changed information, there needs to be a big message hub in the cloud that takes in all the updates from around the web and republishes it to all the other services that are interested in those updates.

Kenamea certain had capability to do this. Using the Kenamea message switch, server applications, such as Twitter, could send messages to client applications (e.g., IM, email, web browser) without the client having to continuously ask Twitter if there were any new Tweets (which is how it works today).

If I were going to do this right now, I would base the technology on the Tervela Tervela TMX Message Switch, which, out of the box, has the ability to switch 1.5 million messages per second, with the predictability and reliability necessary to run the trading system for the largest banking institutions out there. [Note: I am an Advisor to Tervela].

In conjunction with complex-event processing systems, message switching will have a revolutionary impact on the efficiency of distribution and delivery of real-time data around the web. You heard it hear first. Again.

Share this:

Like this:

I’m following up to my post of earlier today because I want to flush out another level of detail on what GAE, AWS, and competitors will eventually require. I’m tentatively calling these kinds of systems Widget Application Servers, because they will first and foremost be used as the back-end for widget-style applications that can be embedded on a page or on your desktop built for various platforms, such Facebook and OpenSocial apps; Google Gadgets, Yahoo! Widgets, and AOL’s Your Minis.

There are three key pieces to the Widget Application Server puzzle: application development, application management, and business management. Right now, one could cobble together all of the pieces from a variety of different Saas platforms, but I think the major players will emerge with an implementation of the whole thing.

Application Development support includes all of tools necessary for building and testing the application itself, including:

With all of these pieces in place, the application developer would be free to build the application and load it up into the cloud, and not have to worry about many of the time-consuming and tedious management and operational issues.

Share this:

Like this:

A month ago, I put together the following slides for a new service, which I called “Fountain.” It has similarities to Google App Engine in that it is a scalable, hosted system for running back-end apps, but Fountain has a number of unique features, which perhaps we’ll eventually see in GAE.

Among these additional features are:

support for languages beyond Python, including server-side Javascript using Rhino (which we first saw circa 1995 in Netscape SuiteSpot), PHP, and Ruby.

A Business Management Workbench, for monetizing the apps. I am keeping this under wraps, as I may still choose to develop it.

Social Networking support, which includes all the libraries necessary to write social networking and widget apps, such as for OpenSocial, Facebook, PageFlakes, NetVibes, iGoogle, etc.

a community, so that people can share code and pre-built libraries with others. One could argue that Google Code provides this functionality, but it is not yet tightly integrated.

A perfect album is an indivisible singularity in the space-time continuum. The mind doesn’t separate one track from the other, and each phrase is a synecdoche not for the track, but for the entire album. As each song fades, the emotional response to the next song begins, even before the first note is heard. You cannot listen to a prefect album on shuffle play. Even Chuck Norris cannot separateStolen Moments from Hoe-Down

And for some unfathomable reason, the San Francisco Jazz Festival held most of its headline shows at the Masonic Hall, which has the absolutely worst acoustics in all of San Francisco. The last year I went to the Jazz Fest, I walked out of an Abbie Lincoln concert because the acoustics made it impossible to enjoy.

To hear the best Jazz in the San Francisco, one had to go to Yoshi’s in Oakland. I’ve been there at least 10 times. But having to travel to Oakland, in my mind, meant that San Francisco had ceded to Oakland not just its waterfront, but its jazz scene as well. So much for San Francisco as a world-class city.

My hope tonite is that Yoshi’s will live up to my expectations! The intimate Oakland venue has headline artists, great acoustics, no columns to block the view of the stage, comfortable seats, and reasonable food in the attached Japanese restaurant. If the SF venue even comes close to matching Oakland, you’ll find me there catching the best of the best.