OOP is several different ideas put together, the most important of which is Fine-Grained Information Hiding.

One can think of information hiding as being the principle and encapsulation being the technique. A software module hides information by encapsulating the information into a module or other construct which presents an interface.

The basic principle of all OO languages is that relatively small things—such as individual accounts in a business program—each encapsulate both their data (in the form of members) and their algorithms (in the form of methods). Our notions of members and polymorphism both work to this goal of hiding information. There’s a lot more to most OO languages, such as whether they include a notion of types and what mechanisms they use for sharing common behaviour. But let’s look at this one principle: objects are responsible for their data and for their algorithms.

should objects be responsible for all of their own behaviour?

There’s a general idea that in a well-constructed program, each object “knows” how it ought to behave. That’s what its methods are for. Quite obviously, objects cannot be responsible for everything involving them in a program. If each object completely encapsulated all of the things it could do or be involved in, you would never pass one object as a parameter in a message to another object.

For every complex problem, there is a solution that is simple, neat, and wrong.

—H. L. Mencken

For example, you would never have collections. If every object “knew” how to organize itself into collections, you wouldn’t need an Array or Hash, would you? In practice, each object in a system can be involved in many different actions. It has to be responsible for some of them, and it has to play a secondary, passive role in others. Most OO programs do not have every object implement its own collections methods. They may include some form of specialization so you can have an array of accounts, but an array of accounts is still not an account.

subject.verb(object)

In the English language, we have the idea of a Subject and an Object in a sentence. For example, when we say “Jack loves Jill,” Jack is a subject and Jill is an object. Jack loves. Jill is loved. It’s the same in OO programs. Sometimes objects are actively doing things through their methods. Sometimes other object’s methods are doing things with them.

Good OO design is, in part, doing a good job of choosing the right bifurcations: given a list of nouns and verbs, making the right decisions about which nouns ought to be the active nouns, the subjects, the ones that “own” the verb in the form of a method. And thus consciously making decisions about which objects ought to be the passive nouns, the objects of the verbs, the ones that don’t implement the methods.

Unfortunately, there are lots of places where we can err on the side of giving too much responsibility to individual objects. It’s understandable, given that OO is theoretically all about objects being responsible for themselves. But as in many other things, in practice good OO is about objects being responsible for a little as possible (but no less!), not as much as possible.

Object Design: Roles, Responsibilities, and Collaborations focuses on the practice of designing objects as integral members of a community where each object has specific roles and responsibilities. The authors present the latest practices and techniques of Responsibility-Driven Design and show how you can apply them as you develop modern object-based applications.

Not all “verbs” have a clear separation between a single entity that is the subject or active entity that ought to own the verb’s definition and the secondary, passive subject entities that should not own the verb’s definition. The easiest examples of this are operations that are intended to be commutative.

For example, many languages define addition as a method belonging to numbers or magnitudes. In Smalltalk, the expression 1 + 2 actually means “send the message+ 2to the object1.” At first glance, this seems elegant: the number 1 handles the message + 2 as integer addition, while 1.0 would handle the same message with floating point arithmetic. What more could you want?

Well, there is a huge problem with this arrangement: Addition is commutative. 1.0 + 2 must give the same result as 2 + 1.0. Using a simple message to implement addition means that you must be excruciatingly careful to handle all of the possible cases so that you do not accidentally violate this property. Now of course, the designers of system classes like Integer and Float went to this trouble. But if you want to add another magnitude class—say CurrencytwoPlaceDecimal—you have to open up all of the system classes and modify them so that 1 + ThirtyCents gives the same result as ThirtyCents + 1.

beware of breaking symmetry

Of course, you may not need to implement a new magnitude class. Fine. But what about symmetric relations like comparison? This is a major pitfall for OO developers: in many cases you need to write a test of equivalence or equality (operations like ==, equal?, eql?, eqv? and all of the other variations on the same theme). In every one of these cases, horrible things will happen if your operation is not symmetric. For every case, x.eql?(y) if-and-only-if y.eql?(x).

This is obviously easy when x and y are both the same kind of object. What happens when they’re different, but still logically equivalent? It turns out that implementing commutative operations and symmetric relations as methods doesn’t work very well. It forces you to smear duplicate logic over many different classes (or prototypes, if your language swings that way).

Here’s a practical example. Let’s say you want to implement a form of equivalence for collections. For ordered collections like lists, what you want is that if two ordered collections have the same members, in the same order, they are equivalent. It’s easy to imagine writing such a method as a mixin for all of your ordered collections. It obviously knows about iterating over ordered collections (recursively, if you grew up with Godel, Escher, Bach on your night stand). Note that you may not have an indexed collection: you might have a list where you simply retrieve values in order.

And likewise, you can write a collection equivalence method for dictionaries like hash tables: if two objects have the same values at the same keys, they are equivalent. Again, a simple mixin will handle things for dictionaries.

Now comes the wrinkle: you decide that an ordered collection ought to be equivalent to a dictionary where the keys are the integers ascending from zero. In other words, ('foo' 'bar' 'blitz') ought to be equivalent to { 0 => 'foo', 1 => 'bar', 2 => 'blitz' }. How are you going to code this? Well, the dictionary mixin could obviously handle equivalence to an ordered list. But we need symmetry, so we have to “open up” the ordered collection mixin and add code for equivalence to dictionaries.

Actually I made up the term object-oriented and I can tell you I did not have C++ in mind. The important thing here is that I have many of the same feelings about Smalltalk.

—Alan Kay

I’m holding my nose, we have not one but two different code smells: 1) Why is one piece of logic in two different places? 2) Why do ordered collections know anything at all about dictionaries, and why do dictionaries know anything at all about ordered collections? The latter is especially disturbing: the whole point of OO is information hiding. How does having ordered collections and dictionaries knowing about each other help us to hide information?

The obvious answer to me is that the knowledge of how to compare an ordered collection to a dictionary does not belong in ordered collections or in dictionaries. The requirement that relations like equivalence be symmetrical across heterogeneous types implies that the types themselves cannot be responsible for implementing equivalence for themselves.

There are similar problems of code duplication and information leakage apply to modelling relations (why do we declare has_one and belongs_to in Rails) and implementing the <=> operator in Ruby. It looks like having verbs “belong to” the subject noun is often a good idea, but not always a good idea.

commuting the sentence of execution

Maybe some verbs belong to objects, but some are best on their own? Maybe + and <=> and equivalent? really ought to be emancipated from their subservience to objects and ought to have their own definitions.

There are two real approaches to object-orientation. The first is known as message-passing. You send an object a message and ask it to deal with it. (This would not work with many people in this newsgroup.) The meaning of the message is local to the object, which inherits it from the class of which it is an instance, which may inherit it from superclasses of that class…

The second approach is generic functions. A generic function has one definition of its semantics, its argument list, and is only specialized on particular types of arguments.

What we ought to do is take some of the verbs and give them their own place in our programs, instead of hanging them off nouns. This isn’t such a revolutionary idea: Common Lisp’s Metaobject Protocol does this exact thing, providing generic functions. A generic function is, in effect, a verb raised to the same level of abstraction as a noun.

This isn’t some revolutionary idea limited to “powerful” languages either: the Java collections framework uses a Comparable interface for ordering collections. The compareTo(...) method belongs to an object. By way of—ahem—comparison, the Comparator interface extracts comparison out of the subject object and puts it in a separate function object. You can perform sorts in Java either way.

If we aren’t using Common Lisp, can we build the verbs we want out of the tools at our disposal? In other words, can we Greenspun generic functions in languages like Java and Ruby?

generic functions in java, plus a detailed look at method dispatching

Let’s start by thinking about generic functions in a Java-like language.

Returning to our example of writing equivalent?, we might make an Equivalent class with a single method, perhaps we can call it eval. So we end up with something like Eqivalent.eval(foo, bar). Java-like languages allow us to write different versions of the eval method with different type signatures, so we can write:

This is hideously broken in languages like Java. You’re almost all nodding in agreement, but please be patient while I explain it anyway: you probably want to pass this along to someone who really needs to be told why it is broken, so why don’t I go ahead and explain it for them?

What you want is that if two objects are of the more specific types—List and Map—we will call the more specific version of the eval methods. But if we can’t “match” one of the more specific eval methods, we want to use eval (Object foo, Object bar). Too bad, that’s not how Java works. Java uses two completely different ways to figure out which method to call when you overload methods!

Way number one is is for figuring out that when you call noun.verb(...), where do we find the definition for verb? This lookup is effectively done at run time, so that even if your code looks like this:

Java will look up the method toString based on foo’s actual type when the method is called, even though you declared it to be an Object. That’s polymorphism at work, and it’s the information hiding working for us. Each object can do it’s own thing where toString is concerned, and we don’t have to worry about it. This is called single dispatch, because it figures out which method to call based on just one of the nouns, the subject noun a/k/a the receiver of the method invocation.

It will always call eval (Object foo, Object bar). It will not call eval (List foo, List bar) if you pass it two lists. That’s because although each of our methods have the same name—eval—Java treats them as different methods, and it figures out which one to call based on the declared types of the parameters at compile time, not on the actual types of the parameters’ values at run time.

Besides writing a Lisp interpreter in Java, your next best bet for building a generic function the way we want it is to find a way to turn Java’s single dispatch into a multi-dispatch, to dispatch on two nouns, foo and bar.

The good news is this: dispatching at run time on two different types is a well-known problem, and the solution is called double dispatch. The problem with double dispatch is that it moves our equivalence code back into our nouns, and we don’t want that.

The Visitor pattern might be handy: it’s a way to add methods to an object at run time in a language like Java that supposedly doesn’t do that. If we decide that everything to be compared using Equivalent.evalimplements an interface called Visitable, we can build a double dispatch system that doesn’t require putting an equivalent? method in the entities being compared:

If that looks like a lot of work to you, I agree. You’re basically replicating Java’s run time dispatching on two types, so you need a bit of a matrix. Is it worth the effort? Let’s consider what this wins you:

Your entities or objects no longer need to know all about other types of entities;

It’s easier to make sure that commutation and symmetry are preserved when the code for a relationship is in its own class and not smeared over multiple entities.

And best of all, you have a nice place for your verbs, and they are no longer second-class citizens behind the nouns.Update: A few people have suggested alternate approaches to implementing multiple dispatch in Java. I think there are various trade-offs to be made, and several different implementations ought to be considered before you write production code.

However, the point of the article is to suggest that not all functions should be implemented as methods of subject objects. I think it makes that point regardless of what you think of using a Visitor and a double dispatch.

What's more convenient to have in general? Strict namespacing or strict noun/verb/noun structure?

In other words, do I type "Equivalence.test(listA, mapB)" or do I type "listA.equals(mapB)" with the understanding that somewhere along the line, either the compiler or the runtime end up calling Equivalence.test anyway. You can't currently pull that last stunt in java at all as far as I know, but you could in more dynamic languages like python, simply by adding onto a list class.

The flipside is that in java you always 'know' what a certain type does and doesn't have and where those definitions are located (leading to IDE conveniences like being able to command+click your way to the definition of just about anything you see in your code, something which is halting-problem impossible when you rewrite class definitions at runtime).

Which one is more important? I honestly don't know. There are good cases for either.

In theory we can have our cake and eat it too. This would require a stripped down (non turing complete) 'programming language' of sorts, more a definition language really, which (probably on a per project or per 'package' basis) lets you explain exactly which verbs should be imported into which nouns. In other words, some file where I can write:

The rule here is that this list is immutable; it can not be modified at runtime. From the perspective of the runtime, it always was, is, and always will be. That way the tooling knows that if I cmd+click on the 'sort' identifier in the following hypothetical java code snippet: "someList.sort();", to jump straight to Collections.sort(List).

NB: Collections.sort is an actual method in the core java libraries, and in order to sort lists in java, you can not write list.sort(), you have to write Collections.sort(list). So far I've always found this a blight, because sorting clearly is something a list ought to know how to do, unlike equivalence testing, which Reg has already established doesn't really belong to lists at all. I find it a happy coincidence that you can also basically have mixins for free when you 'fix' the ability to externalize methods for abstraction purposes.

Relevant Java-like languages to look at are Scala and Nice. In Nice, all functions are generic and don't live on objects. (There's a "Let's pretend we're object oriented syntax" though allowing you to write foo(bar, baz) as bar.foo(baz)). The semantics aren't actually all that Java-like, but the syntax is close enough that it conveys the right idea.

Scala on the other hand doesn't have generic functions. All functions live on objects... but in the same way that all functions in ML or Haskell live in modules. The language is structured so that objects behave much more like dynamically composable modules.

And, you know, both are lacking. In Nice I really want the first class module system from Scala, in Scala I really want multiple dispatch. All methods really should live on objects, because all namespaces should be first class objects and all methods should live in namespaces, but this shouldn't be the be all and end all of what they can do.

"If every object “knew” how to organize itself into collections, you wouldn’t need an Array or Hash, would you?"

Maybe I miss the point here. Let's step back to the classic example of class Animal with subclasses of say Lion and Zebra. How a lion reacts has more to do with how they react with each other or more likely how they react with their group (array / hash) which is called a Pride. A collection of things is a different entity because it is a different entity. How a Pride (of lions) reacts is different than how a herd (of zebras) react.

I don't think you miss the point, you embrace the idea that some objects have a one to one mapping with physical entities in the real world, and some map to more abstract ideas.

A pride of lions may seem like a real-world entity, however researchers have shown that herd behaviour can often be modeled by autonomous entities following their own internal rules. The behaviour of the "herd" emerges from the aggregate behaviour of the animals within it.

Sorry, getting off topic. My point is that you could code for individual lions and not for prides, but it is cleaner and simpler for us to code for a pride.

Which leads us back to verbs... you can stick them all in nouns, but my uggestion is that it is cleaner and simpler in some cases to keep them separate.

Wow, I really like the commutativity example. I ran into a problem a while ago in a shopping-type application: how do you compute the tax? Different stuff can have different tax rates, and sometimes a category of stuff is tax-free below a certain purchase amount. (Not only "can", but "did", and "could have again".) And all that is for just one state; the tax depends on what state the recipient is in, as defined by their shipping address, which you typically collect /after/ the screen that shows their shopping cart and total price.

Trying to build a sane OO design around that was a small nightmare. And although it is a real problem, it isn't nearly as small as commutativity.

I've got some posts over at EnfranchisedMind where I got bit by #equals: given two objects, A and B, sometimes they would both get into a set, sometimes they wouldn't. It depended on whether the set implementation called A.equals(B) or B.equals(A).

Which lead to this series of posts:http://enfranchisedmind.com/blog/archive/2005/10/28/36http://enfranchisedmind.com/blog/archive/2005/10/31/37http://enfranchisedmind.com/blog/archive/2006/01/30/62

"It forces you to smear duplicate logic over many different classes (or prototypes, if your language swings that way)."

I think that in Ruby, this problem is really not that bad, because you can put all of these(logic over many different classes) inside a single module. After all, I think the biggest problem is keeping things that are logically together in the same place, and things that are logically different apart.

In the equivalence example, you could have a mixin for collection and another for maps, but put these two in the same module, same file too! In Java or Smalltalk, you are screwed though.

Your entities or objects no longer need to know all about other types of entities.

But they never really needed to -- they just needed to know about which types they can be equal to. But your Evaluator now needs to know about all classes which potentially might be equivalent. It seems a small gain for a decent price.

Equal if a very difficult relationship to define in OO systems. But as for equivalence, remember that we have a matrix of types to consider.

So if you have types (A, B, C), perhaps you can handle the complexity of making sure that A has an equivalnce method that recoginizes B and C. But you only need one of them to be broken to ruin the whole thing, because it breaks symmetry. For example, if you add type D and update A and B for it, but not C, you have a situation where c.equivalent?(d) != d.equivalent?(c).

Or worse, if you add subtypes and don't get the logic right across all of the relevant classes, you get really weird things happening because equivalence "defaults" to being handled by their superclass some of the time but not the rest of the time.

One of the reasons that this is especially more difficult in Java than in more dynamic OO languages like Python and Ruby, and especialy functional languages like Haskell where this is really a non-issue, is that functions and methods are not objects, first class or otherwise so there really is not separation of any given verb from at least some noun's context, i.e. you can not the notion of "addability" separately from how to add some specific type.

Just a quick note. Squeak Smalltalk does use double dispatching for commutative operations on numbers. So your CurrencytwoPlaceDecimal would implement a method like addToSmallint:i and Smallint + implementation would be something like:

+:n ^ n addToSmallint: self

[It's been a while since I've done any smalltalk, so my syntax and method names are probably wrong, but I think the idea is right]

I've always thought of the C++ STL as a nice example of why not all verbs should "hang" off objects. In the STL the algorithms stand on their own and can be used on any type that supports the required functionality. In a lot of languages the difficulty is how a type indicates the functionality it supports. Often in OO languages the answer is to implement interfaces, which is explicit but can be cumbersome. Duck-typing removes the encumberance at the expense of requiring another why to communicate the requirements (which is often less precise). C++ templates fall somewhere between: telling you at compile time but not necessarily in a friendly way.

The single-dispatch topic is another interesting limitation of "pure OO" languages. I've always seen visitor as an ugly pattern, a best effort to try and patch a language deficiency. These languages have blinkered us to view polymorphism as only acting on one type, when polymorphism over multiple types is an elegant way to solve many problems. A better answer is for the dispatch mechanism to be capable of dispatching on as many types as you like with syntactic sugar for the common case (i.e. single-dispatch).

I wrote something of a follow up about one aspect of the comparison you're discussing on my web site which focuses on the value vs. object semantics of the comparison.

What I didn't talk about, but was thinking of as well, was the requirement for the double dispatch - do you really need or want it for value comparison?

I think that the normal Java overloading rules would be sufficient if you could define sensible overloaded free standing functions to handle it and if you left off essentially un-informative types like Object from the overloading - i.e. you only overloaded concrete types and not abstract ones.

The question becomes one of whether or not the dynamic types of the values matter. I'm not certain that they do for most use cases. In this case the static type overloading normal functions is all that is required. I haven't been able to think of a good counter argument that wasn't also a flawed design in some other way. (I don't know if Java can overload static members or not, so maybe this only applies to a language like C++ where you can have free functions which can be overloaded.)

I think that somewhere where a more complex test was needed it would be better served implemented locally to the use of the test as something odd was already going on there.

I don't like to be a chauvinist about anything, but there is one thing about object methods that makes them incredibly more beneficial than free functions: the fact that you can easily mock them for testing.

There's nothing worse than walking into a pile of C or C++ and discovering that you don't have a seam to fake out things that you don't want to have happen in a test. Often I introduce some rudimentary object orientation, just to get past the problem.