This article is in response to The three greatest paragraphs ever written on encapsulation which I feel is a total failure. As far as I am concerned when you are supplying a description for one of the fundamental principles of Object Oriented Programming (OOP) you should do so in a way that can be used by a newcomer to OOP so that they may obtain a proper understanding of what it means and how it can be implemented. None of the paragraphs in that article meet those goals, which is why I regard them as "not fit for purpose".

I have been in software development for over 30 years, and in OOP for the last 10, and I have read volumes and volumes of articles which are supposed to be teaching aids written by gurus, but which, in my humble opinion, are nothing more than misguided ideas written by those who have lost the plot. I have debunked some of these erroneous definitions in What OOP is Not.

The main problem, as I see it, is because of the richness of the English language:

A single word can sometimes have several meanings.

Several different words can sometimes have the same meaning.

The original idea is expressed using a particular combination of words, but a second person comes along and tries to express the same idea using different words, perhaps to prove how clever or intellectually superior he is. However, this second definition uses different words which themselves may have alternative meanings, and this causes the original idea to begin mutating into something else. Then a third person comes along and tries to express the second definition using different words which hang off one of these alternate meanings. And so it goes on. Different people try to express the same idea using different words, but as these words may have alternative meanings which are different from the one used by the preceding author, the original meaning can get corrupted or even lost altogether. This is exactly what happens in the children's game called Chinese Whispers where the original message "Send reinforcements, we're going to advance" gets changed into "Send three and four pence, we're going to a dance".

The act of placing an entity's data and the operations that perform on that data in the same class. The class then becomes the 'capsule' or container for the data and operations.

Note that this requires ALL the properties and ALL the methods to be placed in the SAME class. Breaking a single class into smaller classes so that the count of methods in any one class does not exceed an arbitrary number is therefore a bad idea as it violates encapsulation and makes the system harder to read and understand.

Note that data may include meta-data (type, size, etc) as well as entity data.

Same interface, different implementation. The ability to substitute one class for another. This means that different classes may contain the same method names, but the result which is returned by a particular method will be different as the code behind that method (the implementation) is different in each class.

Note that this does NOT require the use of the keywords "interface" and "implements" as these are totally optional in PHP. All that is required is that different classes implement the same method name with the same signature.

A lot of my critics tell me that these definitions are too simple, that they have evolved into something more complicated, more "pure". Rather than my definitions being too simple I regard their definitions as being too complex.

Classes which can be instantiated into objects are the mainstay of Object Oriented Programming. The process by which external entities are converted into software classes is called encapsulation, but some people like to use the term abstraction instead. The can cause a problem by the simple fact that there are two meanings for the term "abstract":

A statement summarizing the important points of a text. To reduce to the essential details. Summary, synopsis, précis, résumé, outline, abridgment, condensation, digest.

Thought of or stated without reference to a specific instance. An ideal or theoretical way of regarding things. Separated from matter, practice, or particular examples; not concrete; insufficiently factual; unreal; hypothetical; abstruse; difficult to understand; incomprehensible.

So which is the correct meaning? In his book, Object-Oriented Design with Applications, Grady Booch (one of the authors of the Unified Modeling Language) defines abstraction in the following way:

An abstraction denotes the essential characteristics of an object that distinguish it from all other kinds of object and thus provide crisply defined conceptual boundaries, relative to the perspective of the viewer.

In other words, abstractions are concerned with the simplification of reality and the removal of inessential details that may be associated with that reality. The end result of this process called "abstraction" is supposed to be one or more class definitions, where each class defines a different type of entity, with its own properties and methods, which can be instantiated into objects. Yet some people seem to think that the product of abstraction should be something that is unreal, something that does not reflect the reality which it is supposed to represent. They are the ones whose definition of abstraction veers towards "difficult to understand, incomprehensible", which is why they produce results which also match this definition.

The art of abstraction is supposed to reduce an idea or concept to its bare essentials and leave out any inessential details. As you can see from my Basic Definitions above I have reduced the concepts of OOP to a series of simple statements:

Object Oriented Programming is programming which is oriented around objects.

For each real world object (entity) there is supposed to be a software representation (class).

Objects are instantiated from classes.

A class is a blueprint that defines the variables (data) and the methods (operations) common to all objects (entities) of a certain kind.

Encapsulation can therefore be defined as follows:

The act of placing an entity's data and the operations that perform on that data in the same class.

Note that this simple statement also implies the following:

A class cannot contain the properties or methods for more than one entity. Some properties or methods may be shared, but each entity must have its own class.

A class must contain all the properties and all the methods for its entity. I have seen some people invent a rule which states that a class cannot contain more than 'N' properties or 'N' methods, or a method cannot contain more than 'N' lines of code, but this artificial rule breaks the whole idea of encapsulation so does not exist in my world.

In a typical software application not all classes may actually represent real-world objects or entities. In some cases they may be used to encapsulate or isolate certain functionality or processing which is performed on application data. For example, in the common Model-View-Controller design pattern you have the following:

The Model represents a real-world entity. This contains the data validation and business rules for the entity.

The Controller translates requests from the user into method calls on the Model, then gives the resulting data to the View.

The View takes application data and presents it is some way, such as an HTML web page, a PDF document, or a CSV file.

This can be combined with the 3-Tier Architecture, which splits all data access into a separate Data Access Object (DAO), to provide the structure shown in Figure 1:

With this structure it should be possible to have a large number of Model components, but a smaller collection of reusable Controllers, Views and DAOs which are capable of working with any Model in the application.

One thing you should not attempt to do is to define a single monolithic class to do everything within an entire application. This is sometimes referred to as a "God" class as it tries to be omnipotent, all-seeing and all-knowing. Just as you cannot put all the data you need in a single database table you cannot put all the code to process that data into a single software component. Just as data in a database is broken down into logical units according to the rules of normalisation, the program code should also be broken down into logical units or modules. In OOP these modules are usually constructed as classes. In this way you should be able to modify existing modules or add new modules without having a ripple effect on any existing modules. This also allows different programmers to work on different modules at the same time.

As well as trying not to create classes which are too big one should also avoid going too far in the opposite direction and create a huge number of small classes. A code base which has been abstracted into oblivion is as difficult to follow as a colony of ants. If you don't believe me then try working with a library that uses 100 classes to send a single email! That is what's known as Object Oriented Overkill.

Encapsulation: the property that the information contained in an object is accessible only through interactions at the interfaces supported by the object.

Encapsulation is not a property, it is a process which results in the creation of a class. For each real world object which needs to be modelled in your software you create a separate class. That class will contain all the object's data in the form of class properties, and all the operations which can be performed on that data in the form of class methods. These simple facts are missing from that definition, which makes that definition incomplete and therefore invalid.

That definition also tries to enforce an idea which was not covered by the original definition, and that is the concept of information hiding which is discussed below.

In his article the author does not actually supply a definition of separation of concerns (also known as the Single Responsibility Principle). Instead he points to E. W. Dijkstra's article On the role of scientific thought:

We know that a program must be correct and we can study it from that viewpoint only; we also know that it should be efficient and we can study its efficiency on another day, so to speak. In another mood we may ask ourselves whether, and if so: why, the program is desirable. But nothing is gained - on the contrary! - by tackling these various aspects simultaneously. It is what I sometimes have called "the separation of concerns", which, even if not perfectly possible, is yet the only available technique for effective ordering of one's thoughts, that I know of.

This is not much of a definition as it only identifies the concept, but does not describe how to implement it. If you bother to read that article you will see that the author is stating that a program's correctness, efficiency and desirability should be examined separately and not together. This has absolutely nothing to do with encapsulation! Encapsulation is the act of creating a class while Separation of Concerns involves taking a monolithic "God" class (a single class that does everything) and splitting it into smaller yet coherent units. But how do you determine which parts of this "God" class can be split into smaller units? How do you now when to stop this splitting process? Robert C. Martin (Uncle Bob) provides this description in his article Test Induced Design Damage?:

How do you separate concerns? You separate behaviors that change at different times for different reasons. Things that change together you keep together. Things that change apart you keep apart.

GUIs change at a very different rate, and for very different reasons, than business rules. Database schemas change for very different reasons, and at very different rates than business rules. Keeping these concerns (GUI, business rules, database) separate is good design.

This description matches the separation provided by the 3-Tier Architecture which has the following layers or tiers:

User Interface logic - accepts input from the user and returns results to the user. This could be via HTML forms, or it could be something else.

Business logic - validates the data and applies any business rules.

Data Access logic - used to communicate with a database, usually relational.

With this architecure it is possible to change the components in one layer without having to make any corresponding changes to the other layers.

There is a separate View class for each of the different formats (HTML, PDF or CSV).

There is a separate DAO class for each supported DBMS engine (MySQL, PostgreSQL, Oracle and SQL Server).

Only the Model classes need to generated by the application developer. All the others are supplied in the framework.

So now instead of a single object which tries to do everything we have a collection of objects, each of which is responsible for, or concerned with, a different part of the processing. The user sends in a request which is received by the Controller. The controller calls one or more methods on the Model in order to process that request. The Model may or may not communicate with a database or some other data source. The Model's response to that request is then given to the View so that it can be converted to the desired format before being returned to the user.

When implemented correctly this "separation of concerns" or "separation of responsibilities" not only reduces the amount of duplicated code but also allows for new Controllers, Views and DAOs to be created with little or no impact on the Model components. It should also allow for new Model classes to be created without the need to amend any Controllers, Views or DAOs.

Too much or too little separation

The problem with this concept called "separation of concerns" is deciding how far you should go. If you don't go far enough you end with a compound object which deals with several entities, each of which should actually be handled by a separate class. Take the structure shown in Figure 2 which identifies the separate tables used to hold the data for a sales order:

Far too many OO programmers seem to think that it is acceptable to create a single class which encompasses all of this data. What they totally fail to take into consideration is that it will be necessary, within the application, to write to or read from tables individually rather than collectively. The compound class will therefore require separate methods for each table within the collection, and these method names must contain the name of the table on which they operate and the operation which is to be performed. This in turn means that there must be special controllers which reference these unique method names, which in turn means that the controller(s) are tightly coupled to this single compound class. As tight coupling is supposed to be a bad thing, how can this structure be justified?

On the other hand if you go too far you end up with ravioli code, a mass of tiny classes which end up by being less readable, less usable, less efficient, less testable and less maintainable. This is like having an ant colony with a huge number of workers where each worker does something different. When you look at this mass of ants, how do you decide who does what? How do you identify the process flow? Where do you look to find the source of a bug, or where to make a change? A prime example of this is a certain open source email library which uses 100 classes, some of which contain single methods with a single line of code. 100 classes? For an email? WTF!!!

The only connection between "encapsulation" and "separation of concerns" is deciding which methods and properties go into which class for each of the separate areas of responsibility. Depending on the type of application you are writing you may identify different responsibilities which require different components, but deciding on this list of responsibilities is a totally separate exercise from encapsulating the methods and properties into suitable classes. The object of the exercise is to create classes which are neither too big (the "compound" class) nor too small (ravioli code).

We have tried to demonstrate by these examples that it is almost always incorrect to begin the decomposition of a system into modules on the basis of a flowchart. We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others.

If you bother to read that article you will see that by "design decisions" he actually means the details of the procedural steps necessary to accomplish the application. He is talking about hiding the logic, not the data. He is talking about implementation hiding, not information hiding. The words "implementation" and "information" may sound similar, but they mean different things.

Each class has both properties and methods, but it was never intended that properties had to be hidden and could only be accessed via a method instead of directly. If you don't believe me then take a look at the following articles:

- Encapsulation ensures that the behaviour of an object can only be affected through the object's API.
- Information hiding conceals how an object implements its functionality behind the abstraction of the object's API.

The third paragraph comes from a paper entitled Structured design written for the IBM Systems Journal by W. Stevens, G. Myers and L. Constantine:

The fewer and simpler the connections between modules, the easier it is to understand each module without reference to other modules. Minimizing connections between modules also minimises the paths along which changes and errors can propagate into other parts of the system, thus eliminating disastrous 'Ripple effects' where changes in one part causes errors in another, necessitating additional changes elsewhere, giving rise to new errors, etc

What they are talking about here is called coupling, how modules interact, and the degree of mutual interdependence between modules. If two modules interact they can either be loosely coupled (which is supposed to be good) or tightly coupled (which is supposed to be bad). This is only relevant when you are assembling an application from several modules/classes, and has nothing to do with the construction of an individual module/class which is what encapsulation is all about.

As shown previously in my implementation every Model class (which contains business logic) must have another entity (called a Controller in this document) which can instantiate it into an object so that a method can be called on that object. This is known as "coupling" as it defines how components interact with each other. "A" calls "B" therefore "A" and "B" are joined or coupled by that call. It is also referred to as "dependency" as it shows that the Controller ("A") is dependent on the Model ("B") in order to carry out its assigned task. "A" cannot work without "B", but "B" can be called by components other than "A". In this case "A" is dependent on "B", but "B" is not dependent on "A". How that coupling is implemented in the code decides whether it is "loose" or "tight". Lower coupling is better as it tends to create more reusable methods. Take a look at the following code samples which may be found in a typical controller:

Example 1 contains a hard-coded class name which means that it can only work with that particular class, which makes it non-reusable when it comes to other classes. It also has hard-coded property names, which means that if the list of class properties ever changes then both the Controller and the Model will have to be changed at the same time.

Example 2 does not contain a hard-coded class name, so provided that the method name is available in other objects then it can be used with any of those other objects. It also does not contain any hard-coded property names. The $_POST array can contain any number of fields, and the insert() method can accept that array as a single argument instead of a separate method/argument for each field. This means that the number of fields in that array can fluctuate without ever requiring a change to either the controller or the model.

Here is today's question: which of those two examples demonstrates lower coupling and higher reusability? Answers on a postcard to .....

As well as giving what I consider to be incorrect definitions of encapsulation, the offending article also misses out on providing explanations for those other principles of OOP - inheritance and polymorphism.

The reuse of base classes (superclasses) to form derived classes (subclasses). Methods and properties defined in the superclass are automatically shared by any subclass.

What this means in real life is that you can create a "superclass" which contains sharable methods and properties, then create a "subclass" which inherits everything from the "superclass" by using the "extends" keyword, as in the following example:

The real skill here is in identifying what can be a "superclass" and how many "subclasses" can actually share this code through inheritance. I have seen many suggestions and examples, mostly containing little sharable code from lots of small superclasses, but to my mind the most productive approach is as follows:

Create a single abstract class which contains all the properties and methods which are common to entities of the same type.

Create multiple concrete classes which inherit from this abstract class.

Here the term "abstract" means a class that does not contain enough detail to be instantiated into a viable object. A "concrete" class adds in those missing details. In my own development framework, which deals with database applications, I have a single abstract table class which contains everything which I may need for accessing an unspecified database table, and hundreds of concrete classes, one for each physical database table.

Same interface, different implementation. The ability to substitute one class for another. This means that different classes may contain the same method names, but the result which is returned by a particular method will be different as the code behind that method (the implementation) is different in each class.

What this means in real life is that you have the same method name appearing in more than one class, usually but not necessarily through inheritance, so that the piece of code which calls that method (a page controller, for example) can be used with any class that contains that method. By enabling a controller to be used on more than one class you effectively increase the reusability and sharability of that controller. When a Controller is not tightly bound to a single Model you achieve what is known as loose coupling, which is supposed to be a good thing.

In my own development framework I have a series of pre-written page controllers which use the generic methods defined in my abstract table class. This means that I do not have to have a separate set of controllers for each class. This in turn gives me the ability to use any page controller with any concrete class, which then maximises the level of polymorphism and makes the inter-module coupling as loose as it can possibly be. I can add a new table to my database and a new table class to my application without requiring a new controller to communicate with that table class, which also reduces the development times.

I also have a set of pre-written data access classes which handle all communication with a particular database engine (MySQL, PostgreSQL, Oracle and SQL Server). Although they all share the same method names there is no inheritance from an abstract superclass. When a Model wishes to communicate with the database it talks to whatever DAO has been assigned and gives it an instruction which is equivalent to "using these arguments please construct and execute the relevant SQL query and give me the result". The Model does not know which DBMS is being used, and does not contain any code which is specific to a particular DBMS. The underlying DBMS can therefore be switched without making any changes to any Model. A different DBMS can be introduced into the mix simply by creating a new class file for that DBMS, and provided that it uses the same method signatures its integration will be totally seamless and transparent.

All this confusion is the result of people rewording a perfectly good definition for no good reason other than to impress others with their lexicological skills. By using different words which may have different meanings the original definition can mutate into something else entirely. After multiple iterations of this rewording and mutating the eventual definition could reach a point where it bears no relation to or is a complete corruption of the original.

People keep losing sight of the simple truth that Object Oriented Programming is supposed to increase code reuse and decrease code maintenance through the mechanisms of encapsulation, inheritance and polymorphism, yet they consistently fail to provide either simple definitions or simple examples of these principles being put into practice. As far as I am concerned if you are unable to provide a description that a novice can understand and follow, then your own understanding is less than it should be. By attempting to pass this lack of knowledge onto others, especially when it is disguised with clever words and phrases, instead of adding to the pool of universal knowledge you are actually muddying the waters with pseudo-knowledge. This then makes it harder for novices to filter out the good from the bad, the wheat from the chaff, the excellent from the excrement, so instead of producing well-crafted OOP they end up producing steaming piles of POO.