Before we start this, let me say I'm well aware of the concepts of Abstraction and Dependency Injection. I don't need my eyes opened here.

Well, most of us say, (too) many times without really understanding, "Don't use global variables", or "Singletons are evil because they are global". But what really is so bad about the ominous global state?

Let's say I need a global configuration for my application, for instance system folder paths, or application-wide database credentials.

In that case, I don't see any good solution other than providing these settings in some sort of global space, which will be commonly available to the entire application.

I know it's bad to abuse it, but is the global space really THAT evil? And if it is, what good alternatives are there?

you can use a config class that handles access to those settings. i think its good practise to have one and only one place where configuration settings are read from config files and / or databases while other objects fetch them from that config thing. it makes your design more clean and predictable.
–
HajoMay 10 '12 at 19:37

11

Where's the difference with using a global? Your "one and only one place" sounds very suspiciously like a Singleton.
–
Madara UchihaMay 10 '12 at 19:39

3

"for instance system folder paths, or application-wide database credentials ..." storing credentials is a separate topic, but if the settings are read-only, and I mean read-only, then this is fine. In fact, Visual Studio lets you generate settings that are global to the application. if the settings can change, then that is a totally different story .... by the way, dependency injection is an OOP concept. To truly understand why global state is evil, you need to drink some functional koolaid. I recommend pragmatic sources, such as SICP or some Clojure lectures.
–
JobMay 10 '12 at 20:12

17 Answers
17

To elaborate, imagine you have a couple of objects that both use the same global variable. Assuming you're not using a source of randomness anywhere within either module, then the output of a particular method can be predicted (and therefore tested) if the state of the system is known before you execute the method.

However, if a method in one of the objects triggers a side effect which changes the value of the shared global state, then you no longer know what the starting state is when you execute a method in the other object. You can now no longer predict what output you'll get when you execute the method, and therefore you can't test it.

On an academic level this might not sound all that serious, but being able to unit test code is a major step in the process of proving its correctness (or at least fitness for purpose).

In the real world, this can have some very serious consequences. Suppose you have one class that populates a global data structure, and a different class that consumes the data in that data structure, changing its state or destroying it in the process. If the processor class executes a method before the populator class is done, the result is that the processor class will probably have incomplete data to process, and the data structure the populator class was working on could be corrupted or destroyed. Program behaviour in these circumstances becomes completely unpredictable, and will probably lead to epic lossage.

Further, global state hurts the readability of your code. If your code has an external dependency that isn't explicitly introduced into the code then whoever gets the job of maintaining your code will have to go looking for it to figure out where it came from.

As for what alternatives exist, well it's impossible to have no global state at all, but in practice it is usually possible to restrict global state to a single object that wraps all the others, and which must never be referenced by relying on the scoping rules of the language you're using. If a particular object needs a particular state, then it should explicitly ask for it by having it passed as an argument to its constructor or by a setter method. This is known as Dependency Injection.

It may seem silly to pass in a piece of state that you can already access due to the scoping rules of whatever language you're using, but the advantages are enormous. Now if someone looks at the code in isolation, it's clear what state it needs and where it's coming from. It also has huge benefits regarding the flexibility of your code module and therefore the opportunities for reusing it in different contexts. If the state is passed in and changes to the state are local to the code block, then you can pass in any state you like (if it's the correct data type) and have your code process it. Code written in this style tends to have the appearance of a collection of loosely associated components that can easily be interchanged. The code of a module shouldn't care where state comes from, just how to process it. If you pass state into a code block then that code block can exist in isolation, that isn't the case if you rely on global state.

There are plenty of other reasons why passing state around is vastly superior to relying on global state. This answer is by no means comprehensive. You could probably write an entire book on why global state is bad.

@Truth I got the impression you were asking why global state was bad. You might want to update your question to make it clearer you're asking about alternative approaches as well. Am going to make an edit to the answer, but usually dependency injection is favoured these days.
–
GordonMMay 10 '12 at 19:49

6

Your main argument doesn't really apply to something that is effectively read-only, such as an object representing a configuration.
–
frankcMay 10 '12 at 21:17

23

None of that explains why readonly global variables are bad … which the OP explicitly asked about. It’s an otherwise good answer, I’m just surprised that the OP marked it as “accepted” when he explicitly addressed a different point.
–
Konrad RudolphMay 11 '12 at 9:18

Bugs from mutable global state - a lot of tricky bugs are caused by mutability. Bugs that can be caused by mutation from anywhere in the program are even tricker, as it's often hard to track down the exact cause

Poor testability - if you have mutable global state, you will need to configure it for any tests that you write. This makes testing harder (and people being people are therefore less likely to do it!). e.g. in the case of application-wide database credentials, what if one test needs to access a specific test database different from everything else?

Inflexibility - what if one part of the code requires one value in the global state, but another part requires another value (e.g. a temporary value during a transaction)? You suddenly have a nasty bit of refactoring on your hands

Function impurity - "pure" functions (i.e. ones where the result depends only on the input parameters and have no side effects) are much easier to reason about and compose to build larger programs. Functions that read or manipulate mutable global state are inherently impure.

Code comprehension - code behaviour that depends on a lot of mutable global variables is much trickier to understand - you need to understand the range of possible interactions with the global variable before you can reason about the behaviour of the code. In some situations, this problem can become intractable.

Concurrency issues - mutable global state typically requires some form of locking when used in a concurrent situation. This is very hard to get right (is a cause of bugs) and adds considerably more complexity to your code (hard/expensive to maintain).

Performance - multiple threads continually bashing on the same global state causes cache contention and will slow down your system overall.

Alternatives to mutable global state:

Function parameters - often overlooked, but parameterising your functions better is often the best way to avoid global state. It forces you to solve the important conceptual question: what information does this function require to do its job? Sometimes it makes sense to have a data structure called "Context" that can be passed down a chain of functions that wraps up all relevant information.

Dependency injection - same as for function parameters, just done a bit earlier (at object construction rather than function invocation). Be careful if your dependencies are mutable objects though, this can quickly cause the same problems as mutable global state.....

Immutable global state is mostly harmless - it is effectively a constant. But make sure that it really is a constant, and that you aren't going to be tempted to turn it into mutable global state at a later point!

Immutable singletons - pretty much the same as immutable global state, except that you can defer instantiation until they are needed. Useful for e.g. large fixed data structures that need expensive one-off pre-calculation. Mutable singletons are of course equivalent to mutable global state and are therefore evil :-)

Dynamic binding - only available in some langauges like Common Lisp/Clojure, but this effectively lets you bind a value within a controlled scope (typically on a thread-local basis) which does not affect other threads. To some extent this is a "safe" way of getting the same effect as a global variable, since you know that only the current thread of execution will be affected. This is particularly useful in the case where you have multiple threads each handling independent transactions, for example.

I think that passing a context object either by function parameter or dependency injection would cause problems if the context is mutable, same problem as using mutable global state.
–
Alfredo OsorioMay 11 '12 at 16:34

1

All good stuff, and amen! But the question is about immutable global state
–
MarkJMay 11 '12 at 19:26

1

+1 for mutable/immutable. Immutable globals are ok. Even ones that are lazy loaded but never change. Of course, don't expose global variables, but a global interface or API.
–
JessMar 7 '14 at 19:30

1

@giorgio The question makes it clear that the variables in question get their values at startup and never change afterwards during program execution (system folders, database credentials). I.e. immutable, it does not change once it has been given a value. Personally I also use the word "state" because it can be different from one execution to another, or on a different machine. There may be better words.
–
MarkJMar 22 '14 at 14:47

Since your whole damn app can be using it, it's always incredibly hard to factor them
back out again. If you ever change anything to do with your global, all your code needs changing. This is a maintenance headache- far more than simply being able to grep for the type name to find out which functions use it.

They're bad because they introduce hidden dependencies, which break multithreading, which is increasingly vital to increasingly many applications.

The state of the global variable is always completely unreliable, because all of your code could be doing anything to it.

They're really hard to test.

They make calling the API hard. "You must remember to call SET_MAGIC_VARIABLE() before calling API" is just begging for someone forget to call it. It makes using the API error-prone, causing difficult-to-find bugs. By using it as a regular parameter, you force the caller to properly provide a value.

Just pass a reference into functions which need it. It's not that hard.

Well, you can have a global config class which encapsulates the locking, and IS designed to change state at whatever time possible. I would choose this approach over instantiating config readers from 1000x places in the code. But yes, unpredictability is the absolutely worst thing about them.
–
CoderMay 10 '12 at 23:23

4

@Coder: Note that the sane alternative to a global is not "config readers from 1000x places in the code", but one config reader, which creates a config object, which methods can accept as a parameter (-> Dependency injection).
–
sleskeMay 10 '12 at 23:52

1

Nitpicking: Why is it easier to grep for a type than a global? And the question is about read-only globals, so point 2 and 3 aren't relevant
–
MarkJMay 11 '12 at 19:24

1

@MarkJ: You don't grep for a type, you use a static analysis tool/IDE to find it for you. As to "grep": Ever tried grepping for a global named "config"? Good luck! And the question does not say anything about the values being read-only (maybe the conf can be changed at runtime).
–
sleskeMay 11 '12 at 20:04

If you say "state", that is usually taken to mean "mutable state". And global mutable state is totally evil, because it means that any part of the program can influence any other part (by changing the global state).

Imagine debugging an unknown program: You find that function A behaves a certain way for certain input parameters, but sometimes it works differently for the same parameters. You find that it uses the global variable x.

You look for places that modify x, and find that there are five places that modify it. Now good luck finding out in what cases function A does what...

I would say that immutable global state is the well known good practice known as 'constants'.
–
TelastynMay 10 '12 at 19:58

2

Immutable global state is not evil, it's just bad :-). It's still problematic because of the coupling it introduces (makes changes, reuse and unit testing harder), but is creates much less problems, so in simple cases it is usually acceptable.
–
sleskeMay 10 '12 at 23:53

2

IFF one does use global variables then only one piece of code should modify it. The rest are free to read it. The issues of others changing it does not go away with encapsulation and access functions. Tis not what those constructs are for.
–
phkahlerMay 11 '12 at 16:35

You sort of answered your own question. They're difficult to manage when 'abused,' but can be useful and [somewhat] predictable when used properly, by someone who knows how to contain them. Maintenance and changes to/on the globals is usually a nightmare, made worse as the size of the application increases.

Experienced programmers who can tell the difference between globals being the only option, and them being the easy fix, can have minimal problems using them. But the endless possible issues that can arise with their use necessitates the advice against using them.

edit: To clarify what I mean, globals are unpredictable by nature. As with anything unpredictable you can take steps to contain the unpredictability, but there's always limits to what can be done. Add to this the hassle of new developers joining the project having to deal with relatively unknown variables, the recommendations against using globals should be understandable.

There are many problems with Singletons - here are the two biggest problems in my mind.

It makes unit testing problematic. The global state can become contaminated from one test to the next

It enforces the "One-and-only-one" hard rule, which, even though it couldn't possibly change, suddenly does. A whole bunch of utility code that used the globally accessible object then needs to be altered.

Having said that, most systems have some need for Big Global Objects. These are items which are large and expensive (eg Database Connection Managers), or hold pervasive state information (for example, locking information).

The alternative to a Singleton is to have these Big Global Objects created on startup, and passed as parameters to all of the classes or methods that need access to this object.

The problem here is that you end up with a big game of "pass-the-parcel". You have a graph of components and their dependencies, and some classes create other classes, and each have to hold a bunch of dependency components just because their spawned components (or the spawned components' components) need them.

You run into new maintenance problems. An example: Suddenly your "WidgetFactory", component deep in the graph needs a timer object that you want to mock out. However, "WidgetFactory" is created by "WidgetBuilder" which is part of "WidgetCreationManager", and you need to have three classes knowing about this timer object even though only one actually uses it. You find yourself wanting to give up and revert back to Singletons, and just make this timer object globally accessible.

Fortunately, this is exactly the problem that is solved by a Dependency Injection framework. You can simply tell the framework what classes it needs to create, and it uses reflection to figure out the dependency graph for you, and automatically constructs each object when they are needed.

So, in summary, Singletons are bad, and the alternative is to use a Dependency Injection framework.

I happen to use Castle Windsor, but you are spoilt for choice. See this page from back in 2008 for a list of available frameworks.

Well, for one, you can run in to exactly the same issue that you can with singletons. What today looks like a "global thing I only need one of" will suddenly turn in to something you need more of down the road.

For instance, today you create this global config system because you want one global configuration for the entire system. A few years down the road, you port to another system and someone says "hey, you know, this might work better if there were one general global configuration and one platform specific configuration." Suddenly you have all this work to make your global structures not global, so you can have multiple ones.

(This isn't some random example...this happened with our configuration system in the project I am currently in.)

Considering that the cost of making something non-global is usually trivial, it's silly to do so. You're just creating future problems.

The other problem curiously is that they make an application difficult to scale because thay are not "global" enough. The scope of a global variable is the process.

If you want to scale your application up by using multiple processes or by running on multiple servers you cannot. At least not until you factor out all the globals and replace them with some other mechanism.

First of all for dependency injection to be "stateful", you would need to use singletons, so people saying this is somehow an alternative are mistaken. People use global context objects all the time... Even session state for example is in essence a global variable. Passing everything around whether by dependency injection or not isn't always the best solution. I work on a very large application currently that uses a lot of global context objects (singletons injected via an IoC container) and it has never been a problem to debug. Especially with an event driven architecture it can be preferred to use global context objects vs. passing around whatever changed. Depends who you ask.

Anything can be abused and it also depends on the type of application. Using static variables for instance in a web app is completely different than a desktop app. If you can avoid global variables, than do so, but sometimes they have their uses. At the very least make sure your global data is in a clear contextual object. As far as debugging, nothing a call stack and some breakpoints can't solve.

I want to emphasize that blindly using global variables is a bad idea. Functions should be reusable and shouldn't care where the data comes from -- referring to global variables couples the function with a specific data input. This is why it should be passed in and why dependency injection can be helpful, although you are still dealing with a centralized context store (via singletons).

Btw... Some people think dependency injection is bad, including the creator of Linq, but that isn't going to stop people from using it, including myself. Ultimately experience will be your best teacher. There are times to follow rules and times to break them.

Languages that are designed for secure & robust systems design often get rid of global mutable state altogether. (Arguably this means no globals, since immutable objects are in a sense not really stateful since they never have a state transition.)

Analysis of who has access to an object and the principle
of least privilege are both subverted when capabilities are
stored in global variables and thus are potentially readable
by any part of the program. Once an object is globally available, it is no longer possible to limit the scope of analysis:
access to the object is a privilege that cannot be withheld
from any code in the program. Joe-E avoids these problems
by verifying that the global scope contains no capabilities,
only immutable data.

So one way to think about it is

Programming is a distributed reasoning problem. Programmers on large projects need to divide up the program into pieces which can be reasoned about by individuals.

The smaller the scope, the easier it is to reason about. This is true both of individuals and static analysis tools that try to prove properties of a system, and of tests that need to test properties of a system.

Significant sources of authority that are globally available make properties of the system hard to reason about.

Therefore, globally mutable state makes it harder to

design robust systems,

harder to prove properties of a system, and

harder to be sure that your tests are testing in a scope similar to your production environment.

Global mutable state is similar to DLL hell. Over time, different pieces of a large system will require subtly different behavior from shared pieces of mutable state. Solving DLL hell and shared mutable state inconsistencies requires large-scale coordination between disparate teams. These problems would not occur had the global state been properly scoped to begin with.

Mutable global state is evil because it's very hard for our brain to take into account more than a few parameters at a time and figure out how they combine both from a timing perspective and a value perspective to affect something.

Therefore, we are very bad at debugging or testing an object whose behavior has more than a few external reasons to be altered during the execution of a program. Let alone when we have to reason about dozens of these objects taken together.

State is unavoidable in any real application. You can wrap it up any way you like, but a spreadsheet must contain data in cells. You can make cell objects with only functions as an interface, but that doesn't restrict how many places can call a method on the cell and change the data. You build whole object hierarchies to try to hide interfaces so other parts of the code can't change the data by default. That doesn't prevent a reference to the containing object from being passed around arbitrarily. Nor does any of that eliminate concurrency issues by itself. It does make it harder to proliferate access to data, but it doesn't actually eliminate the perceived problems with globals. If someone wants to modify a piece of state, they're going to do it weather it's global or through a complex API (the later will only discourage, not prevent).

The true reason not to use global storage is to avoid name collisions. If you load multiple modules which declare the same global name, you either have undefined behavior (very hard to debug because unit tests will pass) or a linker error (I'm thinking C - Does your linker warn or fail on this?).

If you want to reuse code, you've got to be able to grab a module from another place and not have it accidentally step on your global because they used one with the same name. Or if you're lucky and get an error, you don't want to have to change all the references in one section of code to prevent the collision.

When it's easy to see and access all of the global state, programmers invariably end up doing so. What you get is unspoken and difficult to track dependencies (int blahblah means that array foo is valid in whatever). Essentially it makes it nearly impossible to maintain program invariants since everything can be twiddled independently. someInt has a relationship between otherInt, that's hard to manage and harder to prove if you can directly change either at any time.

That said, it can be done (way back when it was the only way in some systems), but those skills are lost. They revolve mostly around coding and naming conventions- the field has moved on for good reason. Your compiler and linker do a better job of checking invariants in protected/private data of classes/modules than relying on humans to follow a master plan and read source.

"but those skills are lost"... not entirely yet. I recently worked at a software house that swears by "Clarion", a code generator tool that has it's own basic-like language that lacks features such as passing arguments to sub-routines... The sitting developers were not happy with any suggestions about "change" or "modernizing", finally got fed up with my remarks and portrayed me as deficient and incompetent. I had to leave...
–
Louis SomersMay 13 '12 at 18:21

Globals aren't that bad. As stated in several other answers, the real problem with them is that what is, today, your global folder path may, tomorrow, be one of several, or even hundreds. If you're writing a quick, one-off program, use globals if it's easier. Generally, though, allowing for multiples even when you only think you need one is the way to go. It's not pleasant to have to restructure a large complex program that suddenly needs to talk to two databases.

But they do not hurt reliability. Any data referenced from many places in your program can cause problems if it changes unexpectedly. Enumerators choke when the collection they're enumerating is changed in mid-enumeration. Event queue events can play tricks on each other. Threads can always wreak havok. Anything that is not a local variable or unchangable field is a problem. Globals are this sort of problem, but you're not going to fix that by making them non-global.

If you are about to write to a file and the folder path changes, the change and the write need to be synchronized. (As one of a thousand things that could go wrong, say you grab the path, then that directory gets deleted, then the folder path is changed to a good directory, then you try and write to the deleted directory.) The problem exists whether the folder path is global or is one of a thousand the program is currently using.

There is a real problem with fields that can be accessed by different events on a queue, different levels of recursion, or different threads. To make it simple (and simplistic): local variables are good and fields are bad. But former globals are still going to be fields, so this (however critically important) issue does not apply to the Good or Evil status of Global fields.

Addition: Multithreading Problems:

(Note that you can have similar problems with an event queue or recursive calls, but multithreading is by far the worst.) Consider the following code:

if (filePath != null) text = filePath.getName();

If filePath is a local variable or some kind of constant, your program is not going to fail when running because filePath is null. The check always works. No other thread can change its value. Otherwise, there are no guarantees. When I started writing multithreaded programs in Java, I got NullPointerExceptions on lines like this all the time. Any other thread can change the value at any time, and they often do. As several other answers point out, this creates serious problems for testing. The above statement can work a billion times, getting it through extensive and comprehensive testing, then blow up once in production. The users won't be able to reproduce the problem, and it won't happen again until they've convinced themselves they were seeing things and forgotten it.

Globals definitely have this problem, and if you can eliminate them completely or replace them with constants or local variables, that's a very good thing. If you have stateless code running on a web server, you probably can. Typically, all your multithreading problems can be taken on by the database.

But if your program has to remember things from one user action to the next, you will have fields accessable by any running threads. Switching a global to such a non-global field will not help reliability.

Can you clarify what you mean by this?: "Anything that is not a local variable or unchangable field is a problem. Globals are this sort of problem, but you're not going to fix that by making them non-global."
–
Andres F.May 11 '12 at 0:40

1

@AndresF.: I extended my answer. I think I'm taking a desktop approach where most people on this page are more server-code-with-a-database. "Global" may mean different things in these cases.
–
RalphChapinMay 11 '12 at 14:37

The more globals you have, the greater the chance of introducing duplicates, and thus breaking things when duplicates get out of sync. Keeping all of the globals in your falible human memory is becomes both necessary and pain.

Immutables/write once are generally OK, but watch out for initialization sequence errors.

Mutable globals are often mistaken for immutable globals …

A function that uses globals effectively has extra "hidden" parameters, making refactoring it harder.

Global state isn't evil, but it does come at a definite cost – use it when the benefit outweighs the cost.

I'm not gonna tell if global variables are either good or bad, but what I'm going to add to the discussion is to tell the fact that if you are not using global state, then you are probably wasting a lot of memory, especially when you use classess to store their dependencies in fields.

For global state, there is no such issue, everything is global.

For example: imagine a following scenario: You have a 10x10 grid which is made of classess "Board" and "Tile".

If you want to do it the OOP way you will probably pass the "Board" object to each "Tile".
Let's say now that "Tile" has 2 "byte" type fields storing it's coordinate. The total memory it would take on 32bit machine for one tile would be (1 + 1 + 4 = 6)bytes: 1 for x coord, 1 for y coord and 4 for a pointer to the board. This gives a total of 600 bytes for 10x10 tiles setup

Now for the case where the Board is in global scope, a single object accessible from each tile you only would have to get 2 bytes of memory per each tile, that's the x and y coordinate bytes. This would give only 200 bytes.

So in this case you get 1/3 of the memory usage if you only use global state.

This ,besides other things , I guess is a reason why the global scope still remains in (relatively)low level languages like C++