I work for Red Hat, where I lead JBoss technical direction and research/development. Prior to this I was SOA Technical Development Manager and Director of Standards. I was Chief Architect and co-founder at Arjuna Technologies, an HP spin-off (where I was a Distinguished Engineer). I've been working in the area of reliable distributed systems since the mid-80's. My PhD was on fault-tolerant distributed systems, replication and transactions. I'm also a Professor at Newcastle University and Lyon.

Sunday, June 26, 2011

I've been involved with distributed systems since joining the Arjuna Project back in the mid 1980's. But distributed systems date back way before then, to at least the 1970's with the advent of the first RPC. There are a few definitions of what constitutes a distributed system, including Tanenbaum's"A collection of independent computers that appears to its users as a single coherent system" and this one"A distributed system consists of a collection of autonomous computers, connected through a network and distribution middleware, which enables computers to coordinate their activities and to share the resources of the system, so that users perceive the system as a single, integrated computing facility", though my favourite is still Lamports"You know you have a distributed system when the crash of a computer you’ve never heard of stops you from getting any work done". Back when I started in the area, someone at the University once said that a precise definition was difficult, but you should recognise a distributed system when you saw it.

What all of the definitions have in common is the notion that a distributed system consists of nodes (machines) that are connected by some distributed fabric, e.g., an ethernet. Distributed systems therefore pose problems that are no present in centralised, single machine systems, such as faults (independent failures can now occur) and hence failure detection. Various techniques, such as distributed transactions and replication have grown up to help deal with some of these issues. Other approaches, such as Voltan, also help to build fail-silent processes. And of course we have techniques such as message passing, RPC and shared tuple spaces, were developed to help make developing distributed applications easier. Though of course we learned that complete distribution opacity is not always a good idea. (We did a lot of the early work in this area.)

However, time has moved on and whilst distributed systems continue to be important in our every day lives, the fact is that many of the problems they present and many of the solutions that we have developed, are present within a local environment these days. Think of this like inner space versus outer space, if you like, but the way in which multi-core machines have become the norm means that we have failure independent nodes (cores this time) within a single machine. Alright, they're not connected by an ethernet, but there's a bus involved and they may, or may not, have shared memory too.

Of course none of this is new by any means. If you've ever experienced any of the parallel computing technologies that were the rage back in the 1980's and 1990's, such as the Transputer (excellent architecture and Occam was brilliant!) then you'll understand. There's always been a duality between them distributed systems. But back then parallel computing was even rarer than distributed computing, simply because the costs of the former were prohibitive (which is why a lot of parallel computing research was often done by running COTS hardware on a fast network, because it was cheaper!)

Times have certainly changed and we can no longer consign parallel computing to the realm of high performance computing, or niche areas. It's mainstream and is only going to increase. And I believe that this means distributed systems research and parallel computing efforts must converge. Many of the problems posed by both overlap and solutions for one may be relevant to the other. For instance, we've put a lot of effort into scalability in distributed systems, and failure detection and independence at the hardware (core) level is very advanced.

So when is a distributed system not a distributed system? When it's a centralised multi-core system!

Saturday, June 18, 2011

I've been with JBoss since 2005 and in that time I like to think I've experienced quite a bit about how open source works. It's been a wonderful learning experience for me and definitely turned me from someone who thought open source code and developers were somehow not as good as closed source equivalents into a person who knows that the opposite is most definitely the case!

Case in point: over the last 18 months or so we've been on an aggressive schedule for JBossAS 7, which has some pretty fundamental architectural changes in it. This would've been a challenge for any team (I remember how long it took us to implement HP-AS, for instance, and that team was also extremely skilled). But our teams are small and are responsible not only for development, but also for their communities too: open source means a lot more than just having your code in a public repository!

So these teams are putting in a lot of effort! They're pulling long hours too. Now of course that's not unique to open source or JBoss, but the developers are not doing this for their wages; they're doing it because they have a passion for open source and also for the history behind JBoss and our communities. They (we) believe that this is a game changer and not just another battle in the ongoing war. It's worth noting that I saw this in my Bluestone days too, both when we were independent and then part of HP. I think that the reasons behind that are very similar, only the protagonists have changed.

This release is also so fundamental to everything we are doing that even teams who wouldn't normally have much to do with AS are willing to pitch in, both during work hours and personal time. And what's more interesting is that I rarely have to ask them to help: it's a natural thing for them to do because they're as much a part of the AS community as others. From a personal perspective I've found the AS7 effort very enlightening. I've learned a lot, and much of it not just technical in nature. It is most definitely a good time to be in this role!

Sunday, June 12, 2011

My wife is an addictions counsellor with a strong background in psychology. It's amazed me the sorts of things that people can become addicted to and I won't go into them here! However, she's always said that the first step to addressing any addiction is for the addict to admit it. Now according to her I am addicted to work (her definition of work includes anything that involves a computer, including books and papers, so it's quite broad). I have to admit that I do spend an inordinate amount of time doing things that fall into that definition; even when I'm watching TV I'll usually have a book or laptop on my knee (now it's often my iPad). But I'm not sure I'd say I was an addict. And even if I am, I'm not sure I'd want to be "cured". I think a lot of my friends and colleagues would also fall into that category. But then that last statement does apply to many other forms of addiction. Hmmmm.

Saturday, June 11, 2011

I'm getting really tired (aka fed up) hearing about "new age" developers and applications, when certain people talk about the cloud. Look, there are only developers and applications! There's nothing "new age" about this. Some things change, as with each new wave of technology, but many things remain the same. Sure the problem space has changed and we are seeing new applications and approaches being developed, but let's not imbue mythical attributes to those applications or developers! They're no better or worse than developers or apps of the past. Though if you listen to some, "new age" means thinking and working so far outside the box that you're in the next reality! This is starting to get ridiculous and in the local vernacular it's getting on nerves! Evolution people, not revolution!

Sunday, June 05, 2011

Back in 2002 when I was still with HP and our transaction system was still called Arjuna, I wrote a paper with Santosh on the transition of what had started out purely as an academic vehicle for getting a few of us PhDs, into a rather successful product. Back then we conjectured what might happen in the next few years, but the reality has turned out to be even more interesting.

One thing that updating the paper clearly showed was that something that started life as an academic project has not only had an impact on many products over many years, but also an impact on the people who have worked on it. Individuals have come and gone from the team over the years and they've all left their mark on the system and vice versa. And like the system, they've been a great group of people! So whether it's called Arjuna, JBossTS or something else, it and this paper remain a tribute to them all.

For many years I've been working on extended transactions protocols. The CORBA Activity Service, WS-TX and now REST-TX are efforts on that road. There are many similarities between the problems of long running transactions and large scale replication, so the facts that I did my PhD on both gave me some insights to helping resolve both.

One of the early pieces of research I did was on combining replication and transactions to create consistency domains, where a large number of replicas are split into domains and each domain (replica group) has a relationship with the others in terms of their state and level of consistency. Rather than try to maintain strong consistency between all of the replicas, which incurs overhead proportional to the number of replicas as well as their physical locality, we keep the number of replicas per domain small (and hopefully related) and grow the number of domains if necessary. Then each domain has a degrees of inconsistency with others in the environment.

The basic idea behind the model is that of eventual consistency: in a quiescent period all of the domains would have the same state, but during active periods there is no notion of global/strong consistency. The protocol ensures that state changes flow between domains at a predefined rate (using transactions). A client of the inconsistent replica group can enquire of a domain the state at any time, but may not get the global state, since not all updates will have propagated. Alternatively a client can request the global state but may not know the time it will take to be returned.

Now of course the original work was before the CAP theorem was formalised. So today we see people referring to that whenever they need to talk about relaxing consistency. And of course that is the right thing to do; if I were reviewing a paper today that was about relaxing consistency and the authors didn't reference CAP then I'd either reject it or have a few stern words to say to them. But I still thing Heisenberg is a way cooler analogy to make. However, I do admit to being slightly biased!

I'm off to the Red Hat Partner Summit in Dublin in a few hours time. I'm giving a couple of presentations, one on the future of JBoss and one on how JBoss and the Cloud come together. I'm looking forward to them because the audience will be slightly different to those I've presented these topics to over the past few months, so it'll give me a chance to get much broader feedback. Plus it's been a while since I was in Dublin last, so hopefully there'll be a chance to get out and enjoy the place too.

Friday, June 03, 2011

There's a special event being organised as part of Middleware 2011 called the Future of Middleware Event (FOME). I've been asked to contribute to the event and associated paper/book, along with my friend, co-creator of Arjuna and long time mentor Professor Shrivastava. I'm looking forward to it, because it's related to quite a few things that I'm doing elsewhere too!

Thursday, June 02, 2011

I haven't had a chance to blog here much recently because my attention has been elsewhere, particularly around JBoss specific activities. One of the ones I'm really enjoying at the moment is thinking about the future of JBoss. Since some of this is really independent of JBoss implementations and more to do with where I think middleware in general could/should go, I may try and keep some of the discussions here and cross link in both directions.