Bad Metaphors

The standard way to teach beginner OO programmers about classes is to make a metaphor to the real world. And indeed, I do this all the time in this blog, usually to the animal kingdom. A "class" in real life codifies a commonality amongst a certain set of objects: mammals, for example, have many things in common; they have backbones, can grow hair, can make their own heat, and so on. A class in a programming language does the same thing: codifies a commonality amongst a certain set of objects via the mechanism of inheritance. Inheritance ensures commonalities because, as we've already discussed, "inheritance" by definition means "all (*) the members of the base type are also members of the derived type".

Inheritance relationships amongst classes (**) are usually designed to model "is a special kind of" relationships. A giraffe is a special kind of mammal, so the class Giraffe inherits from the class Mammal, which in turn inherits from Animal, which inherits from Object. And that's great; this clearly represents "is a special kind of" relationships. I have always, however, had a problem with the fundamental metaphor of "inheritance". Why "inheritance"? You inherit genetic information, property, and if you're a titular lord, your peerage, from your parents. And if you make a diagram of a class hierarchy, it looks a bit like a "family tree" in which the derived class is the "child" of the base "parent" class. And indeed, people often speak of the base class as the "parent" class of a "child" derived class, particularly when speaking to beginners.

But the "parent-to-child inheritance" metaphor is awful. A giraffe is not "a child of mammal"; a giraffe is a child of Mr. and Mrs. Giraffe. A "child" is not "a special kind of parent". In reality, you only inherit half your genetic makeup from each parent, and you can inherit real property from any relation, or for that matter, from any non-relation. In programming languages you only "inherit" from related types, and you inherit all their members (*). In reality, everyone has two parents (***), but in programming languages some languages allow inheritance from arbitrarily many "parents", some allow exactly one. In reality, a single, specific person inherits specific property from a single, specific parent, and two different children can have entirely different inheritances from their parent; in programming languages, the "inheritance" relationship does not apply to individual objects, and every child inherits exactly the same thing from the parents. And in reality you only inherit real property when the decedent is dead!

But wait, it gets worse. The parent-child metaphor is ambiguous in any language that supports both lexical nesting and nominal subtyping of classes:

class B<T>{ class D<U> : B<U> { }}

Quick, what type is the "parent" of type B<string>.D<int>? Is it B<T> or B<string> or B<int>? That type is lexically inside B<T>, logically inside of B<string>, and derived fromB<int>; which of those three is its parent? If you drew a graph showing either lexical or logical containment relationships, it would form a graph that looks every bit as much like a "family tree" as the graph showing inheritance relationships. And lexical containment allows access to all the properties of the container from the contained type, even including not-inherited and normally inaccesible members like private constructors! It is not at all clear that one kind of "parentage" is actually more "parent-like" than any other.

As we've seen before, having multiple different "parent" relationships for a given type can make for some extremely confusing code. We have to be extraordinarily careful when writing the specification and the compiler to ensure that we unambiguously describe precisely the relationship we wish to describe. I therefore try hard to avoid "parent-child" metaphors entirely; it is much more clear when writing an example to describe the type relationship as "base type and derived type", rather than "parent type and child type".

(*) Excepting constructors and destructors.

(**) I'm going to stick to talking about class-based inheritance here; my criticisms apply equally well to interface-based inheritance but I don't want to open the can of worms that is all the subtle differences between class and interface inheritance. And I've never much liked the inheritance metaphor on interfaces anyways; a "contractual obligation" metaphor is better.

(***) Assuming that we're talking about members of a sexually reproducing species.

Re: you ** comment on inheritance from interfaces, this is a pet hate of mine, a class implements the interface, it does not inherit from it. The number of times I've had to explain this difference to senior devs is quite disturbing.

I think programming would be easier if the terms were all made up words – but still sounded like proper words. Having learned programming very early before I knew most of the more advanced English words, to me it seems inheritance is the perfect word for base/derived class relationships, but no so fitting for real life usage for exactly these reasons. Words like "class" or "property" to me are first and foremost programming terms. Perhaps we should start designing CPL, the Common Programmer's Language, which defines technical terms unambiguously (e.g. doesn't have the word 'dynamic'), and has a term for every single concept used by any programmer ever, so that we would be able to actually speak a language without confusing metaphors or ambiguities? Nah… Nobody would use it.

Actually the term 'inheritance' makes good sense, only it applies to types and not to objects! The derived <i>type</i> B inherits all members (*) from its base type(s) A<sub>i</sub> just as children inherit their properties from their parents, and each inherited member is inherited from exactly one parent (except weird cases like virtual inheritance, where the member is inherited from one grandparent through one or more parents). Thus IMO the ultimate cause of problems in this particular case is confusion about the type-object distinction.

I don't know. Isn't that a good reason not to use this kind of design? Even with really good names for B and D, I think it would be difficult for me to grasp what this relationship defines. Sort of like your "Smart Pointers Are Too Smart" example.

Perhaps "parent" and "child" don't refer directly to family relationships, but are themselves computer science jargon for nodes in a tree structure. So the parent class is adjacent to the child class in the class heirarchy (tree) but it is the one closer to the root of the tree. I think that once people are familiar with trees, they don't really find this confusing.

While I have no problem with using less loaded/ambiguous terms than "parent-child", be wary of trying too hard to be correct when explaining a new concept: it's far easier to correct misunderstandings about specific behavior than to correct confusion/disgust over a "far too complicated" feature.

When first introducing OO (really class-based, but lets not go *there*!), the essential detail that you *need* to get across, ignoring everything about members, behavior, etc…, is the "is-a" relation. Using real-world examples that the student is immediately going to map to such relations, despite the fact that they don't actual match Liskov or would be data-driven in a real implementation, is far more helpful than dryly describing the properties of "is-a". When the smart student then says later "A-ha! But penguins can't fly!", the correct response is "the real world is complicated", or "So it's probably a bad idea to actually have a 'Bird' base class", not to try to re-explain OO from scratch again!

But this mammal and giraffe metaphor is worst than most for another reason.

Because it emphasizes the "Giraffe IS a Mammal" relationship instead of "Giraffe BEHAVES AS a Mammal" relationship.

This is important when you reach the classical corner cases. While in classic taxonomy it makes sense for whales being mammals, in an OOP world there is no advantage in having whales are mammals instead of fishes (besides, mammal is a completely meaningless category in OOP – unless you consider Sweat() a method you can call :)).

Also, classifying objects for Behaviour has the advantage of making structural patterns (adapter, composite and above all decorator) much easier to introduce.

Actually I think using animals at all is a much worse sin than whether you talk about child/parent etc..

Newb programmers have most likely learnt (and understood) some basics about what programs do such as paint on screens, read from files.

Then "trainers" and OO intro tutorials start telling tham about animals and shapes.. and their confidence is destroyed becuase they haven't a clue what on earth the point of an animal class is in the context of code!!

Worse still the same trainers use large unhelpful headings such as Polymorhism, Encapsulation and yes I agree Eric even the term inheritence isn't intuitive enough, at least though it's a term they have heard of before and it's that scary!

Time and time again new programmers at my company who have done a computer science degree just don't really get OO becuase everything they were taught was so abstract.

i use a file importer program to demo OO and it makes sense to them.

it has a concrete real life feel to it. It's a complex enough problem to justify OO but not to complex that it blows their mind.

I agree with Mike G. – I don't think the animal metaphor is very good for beginners: I know it confused me in the beginning because it just did not seem to relate to programming.

I don't mind the parent/child terminology, because to me it reflects the tree heirarchy well and the direction of the heirarchy. For some reason, the terms "less derived" and "more derived' are just not intuitive to me and I end up translating them to "is a parent" and "is a child" in my mind.

I don't agree with the rant about interfaces being implemented instead of inherited. I like the word inherited for both the base class and the interfaces. The child class is inheriting the method signatures for both base classes and interfaces and inherits the type of both the base classes and interfaces.

The implementation to me is separate concept. If the base is completely abastract and virtual the the child class must implement all the methods. If a class already has all the methods in an interface then it does not have to implement anything.

I think it is not the case, that you can't map mammal as parent of giraffe. Real problem is that newbs can't map abstract things which are present in programming environment to such relations as in animal kingdom. Ok, mammal is quite abstract, but we have some intuition about those "animal" things. For those programming things we do not have any intuition at the beginning because there are much more abstract things to grasp, so we should learn by those totally abstract concepts which are present in real software.

What I like about the animal metaphor is that it is a hierarchy that everybody can instantly understand. If one is trying to come up with an example to relate to a situation, it is trivial to come up with more examples (two classes derived from the same base — giraffe and zebra are both mammals; two classes derived from the same base but with different intermediate derivations — cat and shark are both animals, but cat is a mammal and shark is a fish).

If animals aren't a good metaphor, I'd like to know what to use instead.

@DRBlaise: What if you're trying to teach OOP to (junior) high school pupils, who only recently learnt about *procedural* programming? I am the only guy in my class who knows something about programming, about it works, etc. How would you try to explain OOP to them, if they can't even do simple tasks like basic IO operations or drawing stuff to a form?

This will never happen though, we're learning VBA and there's nothing about classes (or modules) in our book, sigh…

@Les – two examples of OOP I would use to teach students, would be games and GUI.

Designing or theorizing about an OOP game would be something students could really get into and there are all sorts of heirarchies and "is a" and "has a" relationships: characters, weapons, inventory, etc. There could even be an animals in the game and you could talk about a heirarchy of animal behaviors in the game versus how real animals are categorized.

GUI design is something they can see right in from of them and you could talk about "objects" on the screen: forms, menus, borders, scroll bars, text boxes, drop downs, etc.

@JK The article says "Or put another way, if we go back to most recent common ancestor of everything we now call fish (including the incredibly primitive lungfish and hagfish), we find that they also were the ancestor of all four-legged land vertebrates, which obviously aren’t fish at all." This is news? Hasn't this been true of reptiles and birds for ages? Normal people don't reject paraphyletic groups.

IMHO, implementation inheritance should almost never be used by the programmer to create a complex set of related classes – and implementation/is-a inheritance is all we get in C# and most other languages, as they lack any convenient constructs for delegation, mixins, etc. Inheritance should (and is) mainly used as a way to make a complex API more approachable by segmenting it, and enables such nicities as Intellisense – so basically it's what people use to leverage *frameworks* that are large and (hopefully) well-thought-out designs that the user accesses though the OO paradigm. Designing OOP (other than flat classes which organize an API or extend a framework class to gain behavior) should not be an everyday task for the programmer. As OOP is just a shim (v-tables, etc.) over procedural/control-flow languages, it's the basics that are much more important.

I've always felt that OO was a "solution" in search of a problem. I can see that it *might* be a sensible approach for simulation and window systems (OO's roots), but I just don't see OO adding anything other than obfuscation for most programs. Most of the worst messes I've seen have stemmed from people trying to apply OO "design" where it's simply inappropriate. And don't get me started on patterns…

Declarative languages get by just fine (let's be honest here, they get by better) with just algebraic data types and type classes (interfaces done properly).

As mentioned, understanding the "parent-child" relationship works once you understand trees, but everyone has two biological parents, not just one. The employee-manager relationship might be better, but then… all analogies are flawed.

As my Dad says, "Don't tell me what something is LIKE, tell me what it IS".

My saying is "Analogies are like feathers on a snake. (Useless, unhelpful…)" It's an anti-analogy analogy.

Although I realize that analogies and metaphors are useful teaching tools, but they only go so far.

And even "is a special kind of" doesn't always work. I think everyone who tried to declare class Square : Rectangle { /* … */ } sooner or later discovered rather grave problems with such an approach. In fact, I saw in one editions of Stroustroupe's "C++ Language" his statement that "You should neither derive Square from Rectangle nor Rectangle from Square though the latter has some benefits".

There is a conception of "Specialization by Constraint" which allows easily express Square as a descendant of Rectangle, and in fact can even be made compatible with Liskov's Substitution Principle — but only if you stop using pointers/references, and work with value types only. Which is, of course, not very great, after all, pointers/references is one of the greatest inventions in CS.