Blog Archive

Saturday, 7 September 2013

Handling Global Data in PHP Web Applications

Almost every web application needs to handle global data. There
are certain things that just have to be available throughout the
entire code base, such as database connections, configuration
settings, and error handling routines. As a PHP developer, you may
have heard the mantra 'globals are evil', but this naturally begs the
question 'what should I use instead of global variables?'

There are several different strategies available to help cope with
the demand for global data - each has its advantages and
disadvantages and it can be a challenge to know which approach to use
for any given situation. Here I will try to outline what the options
are, how they work, their advantages, their disadvantages, and
examples of under what circumstances each option might be used. The
code samples are not necessarily realistic, they are kept as simple
as possible to demonstrate the idea. The principles and design
patterns that follow are not specific to PHP - they can be applied to
any object oriented language.

Global Variables

Probably the easiest solution to understand and to implement is
the use of global variables. A global variable is defined using the
global keyword, and that makes the contents of the variable
available throughout the entire code base. Typically you declare the
variable (again using the global keyword) in every scope that
you want to use it, although you can just use the $GLOBALS 'super
global' array to reference the value without declaring it first.

Global variable example:

global $database;
$database = new Database();
function doSomething()
{
global $database; //This has to be declared so that PHP knows
//you want to use the global variable, not
//a local one
$data = $database->readStuff();
}

Global variable example using $GLOBALS super
global:

Global variable advantages

The main advantage of global variables is that they are easy
to understand, and easy to use in your code. Their ease of use makes
it tempting to use them a lot, even though there may be much better
options available!

Global variable disadvantages

The disadvantages of using global variables nearly always outweigh
the advantages.

They make your code hard to read and hard to understand. It
is not obvious what the variable is for, where or how it was
initialised, or what is the proper way to use it.

They make your code hard to maintain. It is very difficult to
make changes to global variables, as you have to search through your
entire code base looking for where they have been used.

It is easy to abuse a global variable and cause errors that
are hard to debug. Without any control mechanism for how the
variable is used, it is easy to populate it with invalid data which
can cause errors in other parts of the code (for example, if one
part of the code populates the variable with an array but another
part of the code expects it to contain an object).

It is easy to get confused regarding the variable's scope. If
you forget to declare that the variable is global, you can end up
unwittingly working with a local variable without noticing - until
your app breaks. This can also be hard to debug.

If you combine your code with someone else's code (eg. by
using a third party library or writing an extension for another
piece of software), and both systems use global variables, there is
a chance that the variable names could clash, causing errors in both
systems which are hard to debug.

All parts of your code that use a global variable are tightly
coupled and it becomes very difficult to separate out or re-use a
module elsewhere.

Unit testing is made more difficult, as the test has to know
which global variables are needed, and how to initialise all global
variables with valid values.

When to use global variables

It is rarely a good idea to use global variables, especially in a
large application, but there are some occasions when their ease of
use and simplicity make them an acceptable option. In particular, if
you are writing a short and relatively simple plugin or small app,
which is going to be easy to read and understand, or perhaps a
proof-of-concept or prototype script.

Static Classes (Helper Classes)

Using helper classes that just contain static members is another
easy way of dealing with global data, although they share many of the
same disadvantages as global variables. Classes with static members
can contain both properties and methods, like a normal object, but do
not need to be instantiated before use, and retain their values
throughout the scope of your application. They have more in common
with procedural code than object oriented code, despite the use of
classes.

Static class example:

class SmtpConfig
{
public static $host = 'localhost';
public static $port = 465;
public static $user = 'me@example.com';
public static $password = 'j4a!9Sd@aKP2f';
public static $tls = true;
}
echo SmtpConfig::$user; //This and other values from the class
//are available everywhere as long as
//the file containing the class
//declaration has been included or can
//be autoloaded

Static class advantages

A static helper class enables you to group several related
pieces of data together.

Static classes are easy to to use, easy to understand, and
not so prone to naming clashes as global variables (although now
that PHP supports namespaces, name clashes are not really an issue
except in legacy code).

It is easier to locate where the data was initially defined
(most IDEs will automatically locate the class definition for you,
whereas with global variables, it is not always possible to tell
where they were first declared), although this still doesn't stop
the values being initialised or changed anywhere throughout your
code base.

Static class disadvantages

If a static class has methods which have their own
dependencies, they can be more difficult to unit test than
instantiated classes with dependency injection (see below).

The main disadvantage of static classes is that they promote
close coupling - any object relying on the static members is closely
coupled with the code that initialises those members (and which in
turn may have its own dependencies).

Static classes do not have a constructor, so any static
methods have to do their own dependency checking, and the calling
code has to perform any initialisation beyond just using the default
values. This also typically requires error checking after the method
call - at which point it is difficult to ascertain which dependency
failed.

Dependencies are not enforced, so the code execution can fail
to execute with few clues as to why, and for reasons related to a
dependency of a dependency of a dependency and not for reasons
relating to the place in the code where the failure occurred (a
debugging nightmare).

When to use static classes

Static classes are best used in simple cases where there are no
dependencies, or where the dependencies are simple and fundamental
enough to the operation of your application that they can be taken as
read (since you have no way of enforcing them). An example of this
would be your application's global error handler (although that could
equally well be a singleton - see below - it is best to keep error
handling as simple as possible, as it needs to be bulletproof, and
static classes are arguably simpler than singletons - you could even
use procedural code in a bootstrap file which is simpler still).

Static members are useful as private or protected members of an
instantiated class, and provide a way of storing data once for many
instances (thus reducing the amount of memory needed for each
instance) - for example by holding immutable metadata such as
database column information. They can also be used effectively for
providing small algorithms that are never likely to need changing or
overriding for internal use within an instantiated object. But a class full of just static members is a bit of a 'code smell', and there is usually a better way.

Singleton

A singleton is a class which is instantiated, but for which there
can only be a single instance. A singleton class cannot be directly
instantiated by the calling code (the constructor carries the private
or protected modifier) - it has to be accessed through a static
member which checks whether the class has already been instantiated,
and if so, returns the existing instance, otherwise, creates a new
one (which it holds on to in case another caller wants it). This is
an effort to allow better support for inheritance, and to allow it to
be passed around and treated like any other object. The intention of
the singleton pattern is not really to provide a mechansim for global
data, but to ensure that only one object is created (it being
globally available is a side effect).

Singleton advantages

Ensures there is only one version of the object (allowing a
resource to be shared).

Can be used from anywhere in the code - if it is not already
instantiated, it will be on its first use.

Can support inheritance and polymorphism to a limited degree.

Singleton disadvantages

Enforcing a single instance of an object is rarely the
desired behaviour (for example, whilst it might seem like you would
only need one database connection in an application, requirements
might change, requiring the application to access more than one
database - perhaps for backup or synchronisation purposes).

There is no need for classes that rely on the singleton to
declare their dependency on it, so it is not obvious that they rely
on it and it creates a close coupling between them.

Inheritance and polymorphism are restricted, as there can
still only be a single instance per request (but the implementation
could be different for different types of request).

Once instantiated, the singleton will be held in memory for
the life of the request even if it is not needed again (this might
be desirable for objects that are expensive to instantiate and/or
that are used frequently, but can negatively impact memory usage if
used indiscriminately).

When to use a singleton

A singleton should only really be used if a single resource needs
to be shared among different objects. It is necessary to check that
even if the current requirements do not call for multiple instances
of the object, any likely or potential future requirements will also
not need to allow for multiple instances. A common use of the
singleton pattern is for combining with a global registry pattern
(see below), or for interacting with the operating system or host
that the application is running on (of which there will only ever be
one at a time).

Registry

The registry design pattern allows you to define an object
(usually a singleton) which holds references to various other
resources (typically as key/value pairs) that may be needed by your
application (for example, database connections and configuration
settings). Although the registry itself is usually a singleton (as
you only want a single registry available to the whole application),
the resources it stores are not expected to be singletons - it can
store several different instances of the same class. The resources it
stores do not even have to be objects - they can be primitive data
types or arrays.

Resources can be stored in a hash table (array), or if you know
that certain items will need to be in the registry, you can strongly
type them (which will help with the code autocomplete features of
your IDE). You could also have a mixture!

In these
examples, the developer is allowed to overwrite existing resources,
but only if they make it clear that this was their intention (by
setting the $force_refresh flag).

Registry advantages

If there are common dependencies that are used throughout
your code, you can use the global registry instead of passing an
individual parameter for each one.

A registry allows you the freedom to store and manage your
global data centrally, without restricting the implementation to a
single instance, and allowing full use of inheritance and
polymorphism for the resources it manages.

A strongly typed registry allows your IDE to help you avoid
typing mistakes.

A registry is somewhat easier to use than dependency
injection.

Registry disadvantages

A registry still hides dependencies and is tightly coupled to
objects that rely on it (or its contents), although not as tightly
as a global variable (because the resources can be replaced with
different sub classes).

When to use a registry

Some developers reject the use of a registry on the grounds that
it is just a global array in disguise, and (in particular with a
weakly typed implementation) gives no clue as to how its contents are
meant to be used. However, the generally preferred alternative
(dependency injection - see below) can get out of hand when you have
to inject lots of dependencies, many of which are the same ones over
and over again (you can use a dependency injection container to
manage this, but it is arguably more complex than using a global
registry). Used sparingly then, a registry can be an appropriate
vehicle for managing the most common dependencies that are
fundamental to the workings of your application (typically, one or
more databases, maybe a logger, and a configuration object), without
requiring an unreasonably long list of parameters, or repeated use of
the same parameters, for every object instantiation.

Dependency Injection

Dependency injection requires that the calling code supply all of
the dependencies to an object before use. In most cases, this is done
by passing parameters to the constructor - so that the object cannot
be instantiated unless it has been given all of the data it needs to
do its job. Optional dependencies are often injected using a separate
method call. Injected dependencies are often objects but they don't
have to be - any data the object requires to do its job is a
dependency and must be supplied by the calling code.

Dependency injection advantages

This makes code re-use much easier, as you can just use the
same object in another application or in another setting in the same
application.

It also makes unit testing much easier, as a test can be set
up to inject real or dummy dependencies for the purposes of testing
the object.

It is obvious to the calling code what the dependencies are -
it can therefore supply everything that is needed without worrying
that there might be some hidden dependency which will break the
application if not supplied.

Inheritance and polymorphism can be used to great effect by
specifying the parent class (or interface) as a dependency - the
calling code can then supply any sub class and the object doesn't
need to know or care what the implementation is (allowing for easy
extensibility). For example, if a class has a constructor which
requires a database object to be injected, the calling code can
inject a MySQL database class or an SQLite database, or any other
sub class of database (perhaps even one that hasn't been invented
yet).

By passing dependencies in the constructor, any problems can
be caught early - the class can verify that it has valid dependency
data before it will allow instantiation. This makes debugging much
easier.

Dependency injection disadvantages

The calling code may have more work to do to initialise an
object, especially if the dependencies you are injecting have
dependencies of their own (if this gets out of hand you could look
into using a dependency injection container).

If there are lots of dependencies, you could end up with a
long list of parameters in your constructor which makes the code
difficult to read and understand.

When to use dependency injection

In most cases, dependency injection provides more advantages than
disadvantages, so it is becoming common practice to use it by default
and only avoid it if it is causing problems. Where certain objects
are used extensively throughout the code base (such as a database or
configuration object), injecting them into every object can become
laborious and inelegant. In such cases, it might be better to just
accept a certain amount of close-coupling for the sake of code
readability (and writability!), and to use the setup and teardown
features of your unit testing software to initialise and destroy the
most common dependencies.

Increasingly though, dependency injection containers are used to
handle multiple object dependencies. Using a container allows the
dependencies to be defined just once instead of at every
instantiation, and the dependencies can even be defined in a config
file or in annotations rather than in the code itself. There are
various frameworks available that provide dependency injection
containers, some of which are very lightweight and specialise in just
providing containers.

In conclusion

The developer has to make a judgement call about when to use which
approach for handling global data. Each approach has its advantages
and disadvantages, and whilst some (dependency injection) are clearly
more desirable than others (global variables) in most situations, it
is not helpful to make blanket rules (like 'singletons are evil').

7 comments:

I think that the rule "never use global variables" should really be expressed as "never OVER-use, MIS-use or AB-use global variables".

There are two ways of applying a rule - indiscriminately or intelligently. Those who do the former just apply a rule without thinking, and it is this lack of thinking which causes me to doubt everything which follows.

I disagree that using global variables will ALWAYS make your code harder to read and harder to maintain. Sometimes the efforts taken to avoid them produce convoluted code, which makes the solution worse than the problem.

Great, great, great article! Thorough! Insightful! Academic. Practical. This piece of work thankfully came up near the top in my search engine results. I've grappled with several of the techniques you discussed over that past several years in my work and have done my best to reach for me what would be the optimal trade off (balance) between: maintainability (fixing), extensiblity (enhancing), readability (easy to revisit without studying for a week), portability (re-using), and other good desirable stuff (you name what that is). It doesn't matter what I favor--so I won't say because, yes, the author must decide for themselves. I'd say a key element to any project is discipline and consistency--that will get you a long way despite any of the choices you make in your coding style. I think we all need to look at ourselves as more than just coders or programmers if we can. If we can--sometimes we work in a box and have a supervisor or a project manager or a boss who may not afford us our own discretion or the time to do things the best way (even if best way better meets business objectives). I read this article to learn and reinforce some good concepts as I press forward in my work. I think, Russell, what you've written supports the notion that we might lift our heads up for our screens and realize that yes, we don't have to be just programmers. We can be architects. Build good foundations. Sound ones. Resilient ones. That can be built up on even further. I don't wanna just code. Thanks again for your article!

Did not answered the question I was google about, but this pedagogic article finally gives a overview context of classes, in a way that could be related to as $GLOBALS['whatever'] is often a start-use-solution by new upcomming PHP application programmers!

Somehow, all the variations of using classes is like a horror-movie "the Scope". I will use this article approach within educational purposes.