Summary
In my inaugural weblog entry here on Artima, I question whether the logic about the First Law of Distributed Computing is really that relevant now.

Advertisement

Ha! I thought Id throw out a really leftfield, contentious issue for my first weblog.

I googled First law of distributed and came up with a few hits, as I suspected, for Martin Fowlers definition. If I have to be the only person, to argue around this (not necessarily disagree, well, not in all cases), well so be it.

The first issue that springs to mind is: if you call something the First, you tend to want to expect it to be followed with one of Second, Next or Last  couldnt find anything on those, so perhaps it should be renamed the Only Law of Distributed Computing.

Aside from being pedantic, my main issue, however, is whether distribution is always that bad. You understand where the law comes from, when the qualification goes along the lines of if you dont have to distribute, dont, because it costs.

I really do understand this when youre writing a system that has to do hundreds of transactions per second, but if youre not then is distribution such a big problem.

Facts of Life

My point comes from the fact that mankind has had to deal with distribution and its associated costs for millennia  weve never had everything that we need, during the entirety of our lives, right next to us.

If your car is running low on petrol, you go and get some more  its highly unlikely that every person who has a car has a pump to fill up their car, just sitting there in their garden. When we need food, we might walk to the shop and buy some  this implies distribution and an associated cost, the energy consumed to walk to the shop and back. In ages past, if you didnt hunt  you didnt eat, yet the act of hunting consumed resources.

Most businesses have suppliers that are many miles away, perhaps even on other continents and may serve customer that are also miles away  distribution doesnt stop them.

In other words, I believe distribution, and the cost of it, to actually be a natural entity, so why in computing do we tirelessly fight against it? As with many things in life, it all comes down to a matter of perspective.

The general consensus about why distributed objects are a Bad Thing, is that its more efficient to do things locally. Why take 20ms to do something between machines when it will only take 20ns to do it locally in a single VM?

Well, to go back to my previous analogies, would you want your house to actually be filled with every possible amenity you need in your life? Aisles of food from your local supermarket, coupled with the obligatory petrol pump in your back garden, and your desk and PC from your office? You wouldnt have enough space to have the contents of your entire life sat at the side of you.

Now back to the computing world  I might be able to run my entire software architecture out of one VM, but hey it might be slow as hell, but...at least its local.

Another point that always makes me wonder is that most of the people who quote the First Law, quote it on the Internet, the biggest public distribution network in the world. Why? Because they want their opinion heard; thus distribution. But voicing this comment is not without cost - they have to log on to a site, load in the pages, write their comment in an HTML form, submit it, etc.

So the point of distribution should not be if its more efficient to do it locally, then dont distribute it, it should be if it is more beneficial to distribute then do it., or if the cost of distributing is less than the benefit gained  whats the problem?

You see, from my standpoint of doing systems integration for the majority of my work, I just cant move my database, workflow and mainframe on to one box, and happily let them co-exist.

If it takes the human eye between 300 and 400 milliseconds to blink, do you think it really matters to me that a Jini service call via RMI may only take 20ms to complete?
If a user is willing to wait 500ms for a response, I could do 25 remote calls, before that user is going to become even slightly aware of the cost of the function being performed.

Facts of Reality

Distribution, both in distributed systems and distributed objects, is a fact of system reality, whether we like it or not. And many more people work in a world where processes and systems have to integrate, than those that can build green-field systems or where everything is in one box.

Ambiguity is an undercurrent of linguistics. For example Doing something remotely is orders of magnitude slower than doing something in-process. Oh that tells me its slower. Its like the old argument, "C++ is x% faster than Java". Give me the real-time numbers for my given situation, because only that will tell me whether it really matters.

Performance is of course an issue, but only if it is in fact an issue in the scenario, dont just use the Performance card to dismiss the benefits of distribution offhandedly

What is happening now?

I never really get that distribution is only okay if you have a web browser that talks to an application server (running all your middle-tier) that connects to a database. I see talk of Service Oriented Architectures mainly with Web Services, which is okay for others  Ive been doing SOA with Jini since 2000, and it hasnt seen me wrong yet, but your mileage may vary.

But a key differentiator with SOA, and grids, is that it is distributed (it doesnt have to be, but it kind of defeats the purpose), each application that provides a service can be on a different box, the registry that allows connections to these services is also distributed (in general).

So if the next generation of applications and architectures that are being touted are distributed  whats made people change their minds, and seeing that they are changing their minds, dont you think that people need to actually think about distribution in a rational manner, rather than just dismissing it out of hand?

Indeed, but there is a difference between "Don't distribute if you don't have to" and "Don't distribute unless it's beneficial".

If I can do everything locally, why would I ever need to distribute? Moreover why is everybody talking about SOA, if everything's can be local anyway.

I may not have to distribute, but if the end benefit of distributing outweighs the cost, then why should I blindly ignore distribution just because "I don't have to" i.e. if the end justifies the means, why does it matter?

As I understand the law "Don't distribute your objects!" it is not about accessing other machines over the 'net, it is about how you design your own application. For some weird reason, people these days seem to loooove splitting a single application into several nodes with lotsa communication between them. That, of course, is more often than not quite daft. Performance is not the only problem, or, IMHO, even the worst. Fault tolerance, added solution complexity, and extra administration of the servers is another costly effect of needlessly splitting up your application.

Like you say, accessing other applications in a distributed fashion is often a good thing. (But we should use message-based distribution more and RPC-based distribution less, IMO)

> As I understand the law "Don't distribute your objects!"> it is not about accessing other machines over the 'net, it> is about how you design your own application. For some> weird reason, people these days seem to loooove splitting> a single application into several nodes with lotsa> communication between them. That, of course, is more often> than not quite daft.

Unless, as I have said, a benefit can be realised, that outweighs the cost.

Splitting is good when you actually have a proper think about what a 'service' should be - My defnition is

A service is an application or an implementation that cleanlyencapsulates a responsibility of either the business domain or technicalinfrastructure, exposing key interfaces, that can be re-used by variousother applications over a network

Using Jini we have services that handle centralised configuration for all our services and applications, error-handling and logging, data source connectivity, EDM.

But because we're using Jini for Systems integration and we have 14 yrs worth of systems to integrate, trying to fight distribution is just futile. So we went the opposite way and embraced it, and we're seeing some good side-effects, such as quicker times during Disaster Recovery tests, ease of migration from one server to another, intrinsic load balancing, etc.

Unfortunately Fault tolerance is an issue that affects more than just SOA, yet I believe that things like Jini, don't particularly add to the complexity for FT - it actually makes it easier. Plus serviceUI allows you to administer your services from a single place, which is always good.

> Like you say, accessing other applications in a> distributed fashion is often a good thing. (But we should> use message-based distribution more and RPC-based> distribution less, IMO)

I wholeheartedly agree, but the development difficulties regarding the callback mechanism (ease of development when compared with synhronous method calls), will always make that change difficult to achieve. A potential middleground is to develop to a synchronous model, that is immediately turned into a message based protocol, with a library handling the async stuff for you.

> Like you say, accessing other applications in a> distributed fashion is often a good thing. (But we should> use message-based distribution more and RPC-based> distribution less, IMO)

The right answer is in fact mobile code. This allows implementation to go with the data so that everyone doesn't have to implement some mapping from the data to programatic data. This is where the Java mobile code (with associated security) coupled with the Jini FT make for a very powerful mechanism.

When we send XML all over the place, then everyone has to write code, or have the correct code to manage the documents decoding. This makes it very difficult to change the programming model.

With mobile code, if you suddenly need to put half of an object on one machine and the other half on another to optimize some local operation, you can create a smart proxy that wraps two remote references to each of the appropriate objects. The receiving VM gets the same view it has always had of the programatic interface, but under the covers, you can do something completely different. If you add an interface (or two) when you do this to the object, then new code can use the object with a new API while the old code can maintain the old code interface.

There's an important concept with building distributed systems that is present in the UN*X way of writing applications.

If you understand how pipes provide the power to the user, you learn to write your applications so that they consume their input from stdin and generate their results on stdout, for batch mode operations.

If all you provide is a GUI that lets the user load a file, perform some work, and the save that file, then you haven't provided the ability to let the user distribute (there's that word again) the power of your software to the place that they need it.

The PBM graphics tools are a great example of how very convenient image manipulation can be designed as a pipe based tool set. Think how neat the XV tool is for image manipulation, but how painful it really was to use for lots of work.

In the SOA world, you can either believe that you know every possible way that your software will add value to the system, and you can make the interfaces work only that way. Or, you can do it the Jini service way, and let the users have the power to distribute the system however they need.

With Java, we have the ability to put lots of things into a single VM. The J2EE model does this, but reduces the power of the platform by removing access to parts of it.

Jini adds lots of things to the platform, and removes nothing from the platform. Jini is not meant to make it easy to query a database. It's not directly meant to make it possible to take the data from the query and put it into a JMS based system queue. Jini is not designed to provide a drag and drop interface for deploying software components into a container that understands those components.

Jini's architecture is based on the practice of distributing services first. If you are using Jini on a regular basis, and have recognized the power that its model provides, you will likely feel that distibuting is a good and easy thing to do in solving a problem that deals with multiple systems and/or technologies.

This is exactly my point. Distribution has it's benefits, and because of those benefits, which may not be present in an 'all-local' system, we must decide about distribution on the merits of the problem at hand, not just allude to a rule of thumb that says 'if you can do it all locally, then don't even bother considering distribution'.

Distributed systems do come with their own set of concerns, but it also provides a great deal of benefit, and these benefits must be articulated in the general discussion of your problem - not just swept under the rug.

If I wanted to do everything locally, I could have done it in VB6 and then not only would I not have to care about distribution, I wouldn't have to worry about multithreading, because VB6 doesn't support multithreading as part of it's core constructs!

"In the SOA world, you can either believe that you know every possible way that your software will add value to the system, and you can make the interfaces work only that way. Or, you can do it the Jini service way, and let the users have the power to distribute the system however they need."

And this for me is the real issue.

Many people don't seem to understand that just because an interface shows RemoteExceptions etc. doesn't mean that the implementation underneath *is* remote. And, even if the implementation is remote we have infrastructure available today which can automatically cut out all the serialization etc. in cases where two communicating remote objects are located on the same physical machine.

When one comes across remote interfaces, one should code like the implementation is remote. Then, at deployment time, I can "localize" everything or distribute it as I see fit, safe in the knowledge that the code should work either way.

If you start with the opposite assumption and assume in design and code that everything is local you've just put a barrier to growth in place. You've put yourself in a position where, if the system get's too big for one machine, you either have to resort to hardware virtualization techniques or re-write your code.

[As an aside, that hardware virtualization ends up attempting to handle all the remoteness issues which should have been handled by the application code. The side-effects of this are all the sorts of problems mentioned in "Note on Distributed Computing" where the code believes itself to be local and cannot do anything intelligent when the lower layers barf due to network partition or whatever.]

Building distributed systems is certainly harder than building local systems but developers and vendors are making life more difficult than it needs to be. Why? Because they all start from the single-machine assumption and then try and make things work across multiple machines seamlessly. You can see this in the tools, infrastructure and systems they build.

> "In the SOA world, you can either believe that you know> every possible way that your software will add value to> the system, and you can make the interfaces work only that> way. Or, you can do it the Jini service way, and let the> users have the power to distribute the system however they> need."> > And this for me is the real issue.> > Many people don't seem to understand that just because an> interface shows RemoteExceptions etc. doesn't mean that> the implementation underneath *is* remote.

> When one comes across remote interfaces, one should code> like the implementation is remote. Then, at deployment> time, I can "localize" everything or distribute it as I> see fit, safe in the knowledge that the code should work> either way.

If at least you design for distribution, in a local system there will be very little performance hit(if any at all), but a small increase in development due to catching RemoteExceptions.

> If you start with the opposite assumption and assume in> design and code that everything is local you've just put a> barrier to growth in place. You've put yourself in a> position where, if the system get's too big for one> machine, you either have to resort to hardware> virtualization techniques or re-write your code.

Yes - you've basically boxed yourself in. It's like the hunter analogy - I can sit outside and wait for the food to come past which might be alright for a certain amount of time, but sooner or later I've got to go further afield.

> [As an aside, that hardware virtualization ends up> attempting to handle all the remoteness issues which> should have been handled by the application code. The> side-effects of this are all the sorts of problems> mentioned in "Note on Distributed Computing" where the> code believes itself to be local and cannot do anything> intelligent when the lower layers barf due to network> partition or whatever.]

> Building distributed systems is certainly harder than> building local systems but developers and vendors are> making life more difficult than it needs to be. Why?

Because they believe distribution to be inherently bad, because of 'magnitudes slower than'. Also distributed systems face problems local systems either discount or ignore. Some client-server frameworks won't start unless the back-end is available - i.e. they discount failure in the network, they assume that the network and it's resources are always available - and I would say that an application like this is a bad example of distribution; I should at least be able to do some things (reduced functionality) in the application that can be then rolled in, when the backend is restarted

Also people think that if you say 'we have a networked app, but remember, the network is unreliable, but we can deal with that' that your product is flaky, because they like to discount that their network may fail so it has to be 'this' app

> Because they all start from the single-machine assumption> n and then try and make things work across multiple> machines seamlessly. You can see this in the tools,> infrastructure and systems they build.

One of the great things about Jini, is that you're coding to a network-oriented (or non-location bound) interface not a machine/address-bound application, and you can't really afford to think in the local spehere any more. - i.e. you know it's remote, and you know you have to deal with that, aiming for 'reliability out of unreliable components' - but that ability to connect to anything (or in the case of local proxies, nothing at all) can give you a huge scope in things like integration etc.

One thing that I find is a part-step in between understanding of local models to properly understanding the remote model and that is precisely what you say above in 'then try and make things work across multiple machines seamlessly', the point is local defines location, so multiple machines defines *multiple* locations or multiple localities. So you have a number of localities that someone is trying to bend into a 'distributed' system, and unless you're either very lucky or very smart, it won't happen.

When you think about proper distributed systems, location is (or should be) an irrelevance in development and in how your pieces fit together.

Quoting from one slide:Why is COP Nice?- The world is parallel- The world is distributed- Things fail

Ignoring concurrency/distribution issues is like ignoring the world around us. In most languages distribution is a PITA because they ignored distribution until the last second available. We can deal with it by using a library/framework (e.g. Jini, Javaspaces, java.util.concurrent) or using a better suited language (e.g. Erlang, Oz, E), but if we refuse to acknowledge this aspect of programming in our code it'll bite us later.

"We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and take into account issues of concurrency and partial failure.

We look at a number of distributed systems that have attempted to paper over the distinction between local and remote objects, and show that such systems fail to support basic requirements of robustness and reliability. These failures have been masked in the past by the small size of the distributed systems that have been built. In the enterprise-wide distributed systems foreseen in the near future, however, such a masking will be impossible.

We conclude by discussing what is required of both systems-level and application-level programmers and designers if one is to take distribution seriously."

> "According to MartinFowler in> PatternsOfEnterpriseApplicationArchitecture, the> FirstLawOfDistributedObjectDesign is: Don't distribute> your objects!"> http://c2.com/cgi/wiki?FirstLawOfDistributedObjectDesign> > Yes, it is that bad. The object is never the right unit> of decomposition for distributed systems. A detailed> explanation of why is here.>

I don't think anyone suggested that object's *were* the right unit of decomposition. Most "distributed object" systems are actually "remote interface" systems which leads to confusion.

This confusion is made worse by the fact that one implements the remote interface, in many languages, as an object though it quite often delegates to other objects.

Yes you have to be aware of all the things mentioned in "A Note", yes, you have to be aware of the above but that doesn't seem like a lot of extra stuff to carry around so, being purely selfish and having an enquiring mind can you say a little more on the topic of "it is that bad/more complex"?

The reason I ask is because I find that I have to make myself aware of new/complex things everytime I use something I've not used before so I'm forced to absorb more information and make more choices. To me, that's just a rule of the game of programming be it distributed or local.

> "We argue that objects that interact in a distributed> system need to be dealt with in ways that are> intrinsically different from objects that interact in a> single address space. These differences are required> because distributed systems require that the programmer be> aware of latency, have a different model of memory access,> and take into account issues of concurrency and partial> failure.

Hence the fact that you have to program differently for local interactions than you do for remote interactions.

One point is, do your local systems have latency, do they have to deal with concurrency and partial failure?Realistically they should, but we tend to either ignore or discount these very important issues in the local space, and make too big a deal of them in the remote space.

Latency - is only an issue if it's an issue for the people using the system, and a local system's performance ceiling is bound by the system it is on i.e. (pX) whereas a distributed systems performance is bound by (n*Avg(pX)-((latency+serialization)*hops)) and pX is always decreasing as load increases. However the relative decrease of a single pX under a given load in a remote system should be less that the decrease of pX in a fully local system. Plus you can generally do something to improve 'vanilla' latency and serialization

Concurrency - Regardless of whether it is remote or non-remote you will have issues with multithreading and locks - so concurrency is a moot point as it can be considered to be an equal factor.Principle of All Other Factors being Equal

Partial Failure - Mitigating risk of a single point of failure is definitely a Good Thing. If I know that part of my processes can continue in the absence of a single component then that is also a Good Thing

My point about distribution, is that distribution is a fact of reality, having eveything you've ever wanted at the side of you is not.

In the case of the car, distributing the car and fuel source means that you have to have a fuel tank, fuel gauge, I have to fill up etc. If the car and service station were co-located on the same hardwareI could just attach it to a fuel pipe and do away with all that complexity.

As you point out, distribution is sometime worthwhile, and in these cases you do the work to address with the issues.

In your example of Web Server/App Server/DB, I think it is hard to get all users to sit on the same desk, so the Web Server and App Server have to be distributed. However the closer the database can get to the app server the better, and I would like to see architectures where they are in the same process. This would remove the need for discussions like this: