Object encapsulation and properties

published:
Sat, 4-Mar-2006
|
updated: Sat, 4-Mar-2006

I'm sure we can all rattle off the three tenets of object-orientation
without even thinking about it: encapsulation, inheritance and
polymorphism. Easy-peasy. I'm sure we could, without even the
slightest sheen of sweat appearing on our brow, knock off a quick
description of what they mean as well. So why don't we pay attention
to them when we write some object-oriented code?

Let's just take one: encapsulation. Good grief, this is the simplest
of the lot. It means? Well, dear reader, think about what it means
before you read on. We'll wait.

If you're like me, you will have made up something about enclosing a
bunch of data and behavior with some code that hides the internals of
that data and behavior. A black box in other words. You know how to
create one, you know how to poke at it so that it does some work, and
that's about it. Without looking at the code that implements it,
you're pretty much in the dark.

And therein lies its benefit: because the black box has a well-defined
interface for its use and hides the internal details of how it's
implemented, we can replace the black box with another whenever we
want; so long as the new black box has the same interface, of course.
That's a very powerful and useful concept.

There is an extremely valuable construct in modern OO languages that
is a pure realization of encapsulation, but in my experience tends to
be little or badly used: the interface type. With an interface you can
define pretty strictly what the behavior of an instance of that
interface should be. Furthermore there are absolutely no hints as to
how the behavior has been implemented. It's encapsulation at its most
virginal in a sense. Breaking encapsulation that has been defined by
an interface tends to be quite hard and generally not worth it. It's
easy to create classes that satisfy the interface (known as
"implementing the interface"), but it tends to be hard to design and
define the interface in the first place. Instances of the implementing
classes can be used for an interface instance without the calling code
being any the wiser. You can swap them in and out ad infinitum, the
ultimate black box.

The visibility keywords like public and
private in object-oriented programming languages are
expressly designed for encapsulation of data and behavior in classes.
There's whole newsgroup threads out there about whether the
protected keyword should be allowed, whether inheritance
(another of the pillars of OOP, remember) or friend
classes break encapsulation. Delphi for example has suffered until
recently from the implicit friend relationship for classes in the same
unit (I remember jabbering on about this at TurboPower with the
developers once).

There is another concept that breaks encapsulation, but we tend to
ignore it or we don't realize that it does break encapsulation. It's
another of those things that comes from a procedural programming
mindset, and is used by a programmer who has migrated to OO
programming from old-style procedural programming. Like me.

The concept even has a law that frowns on it: the Law of Demeter.
There's also a principle we should apply:
the
"Tell, Don't Ask" rule.
Yes, I'm talking about properties (and accessors and mutators, or
getters and setters).

The problem is that properties break encapsulation. They are a window
into the internals of your black box. Worse than that, they encourage
the developer to inadvertently move behavior that should be internal
to the class outside into another. Since I got such a lot of
opprobrium last time for discussing someone else's code, this time I'm
going to use some of mine.

Quick: what's wrong with it? I can come up with a couple of things
straight away: it's not a class, it's just a record (or a
struct, in C-language parlance); any encapsulation here
is mostly about gathering a bunch of data in one packet, there's no
data hiding here at all. This is typical Julian code from 5 years ago,
essentially.

Here's some code (from a method in the splay tree class) that uses
this node definition, together with a call to it.

Ugly, eh? Look at all those bloody carets for a start that muck up
your scanning and reading ability. Look at how the code delves deep
into the node record to get at items of information. Gasp at how I
chain from the node to its parent to get at one of the parent's
children (a violation of the Law of Demeter). Notice how from this
class (the TtdSplayTree class) I need to know intimate
details of how the node is constructed: in effect I'm asking the node
for internal data so that I can manipulate it outside the node.

Man, looking at that lot, I'm sorry for all those people who bought
the book. Anyway...

Here's some code I'm writing for an article that will derive balancing
algorithms for binary search trees, together with how it's called
(warning: this code is still being developed; don't even use at your
own risk):

This code covers the same functionality, but I would venture is much
easier to read in the calling code. The calling code needs to know
nothing about how the node is constructed internally. It just knows
that, conceptually, there's a left and right child in a node and that
nodes have parent nodes so that a binary tree could be constructed.
Those nodes could be individual objects (so that somewhere there's
some code that news up (creates) a node), or it could be that the
entire tree is stored as an array of nodes and they're pointed to by
indexes and not references (much as a heap is built).

In essence, knowledge about the internals of the node are stored
within the TSimpleNode class, and further hidden behind
the INode interface. The binary tree class knows nothing
about all this (indeed, the way I've designed it, the splay algorithm
is written using the Strategy Pattern and not by inheriting from a
base tree class as I did it in olden times).

In writing this later code I was extremely attentive to not exposing
the left and right child references, either as public fields or as
properties. If I had, I'd have produced the same code as before, where
the tree class in essence becomes a controller and manipulator of dumb
nodes.

What I'm trying to get at here is that the first code example exposes
way too much and because I wrote it that way, in a procedural fashion,
it meant that I then violated encapsulation all over the place. The
tree is intimately linked to the node definition with a very high
coupling; they are in essence a single class that happens to have been
written as a class and a struct. Replacing the node
record with a node class with Left and Right
properties would only get rid of the carets in the code. Nothing much
else would change; the node is a dumb class, a data container, with no
behavior worth speaking of.

Now, it may be argued that this is OK ("how can you define a binary
tree without referring to nodes, eh?") and that the tree having
intimate knowledge of its nodes is perfectly acceptable ("that's how
all the algorithm implementations do it, anyhow"). And in this case,
yes, maybe it is. But I've seen lots of code in my time where
properties of an object are deliberately used outside the object in
order to manipulate the object in some way, manipulation that can and
should be part of the object's behavior.

To me this one line of code raises several flags. There's the code
duplication for a start, there's the violation of the Law of Demeter
(which leads to the code duplication), there's the built-in assumption
that the Name string (of the State object, of the business object)
cannot be null, and there's that niggling thought wondering what
originalNodeName is going to be used for. If you could
ignore the latter point, you really would like to write this:

So, yes, indeed, there is some decisions about an object being made
outside the object. You should be writing something like this instead:

return busObj.IsSameName(newName);

Notice that the actual code doing the checking for equality hasn't gone
away: it's just been refactored into the business object class (wherever
that may be).

Anyway, in my "improved" second code sample the tree code (actually an
algorithm class) just knows about the INode interface.
The INode interface defines some basic behavior ("given a
key, return the child node where the key might be found", "given a key
and an item, attach a new node in the correct child position and return
it"). The TSimpleNode class that implements the
INode interface could be replaced at a moment's notice
with something that has the same interface and the tree class would
still work just fine. (Note that the INode interface does
define a couple of getters: the GetKey and
GetData functions. This goes along with the idea that a
node carries the data and the key, and has behavior that defines how
it acts in a tree.)

I think I'll stop here for now (this article is already pretty long),
but I'll continue with this thought in some future postings.