Better Understanding PHP’s Garbage Collection

It’s interesting how just a few years can make a difference in the names that are given to things. If this were to come up today, it would probably be called PHP Recycling Options, because rather than picking things up and throwing them into a landfill where they’ll never be seen again, we are really talking about grabbing things whose use has passed and setting them up to be useful again. But, recycling wasn’t le petit Cherie of society back when the idea was developed and so this task was given the vulgar name of ‘Garbage Collection’. What can we do but follow what history and common usage have given us?

Program Generated Garbage

Programs use resources; sometimes small ones, sometimes much bigger. An example would be a data field. A program may define a data field, say a sequence number, that is used in the program. And once defined, this data field will take up space in memory, probably only a few bytes, but space nonetheless. Since every machine or programming environment has a finite (albeit large) amount of space available, the remaining space that it has left will be reduced by the amount of space that this field takes up.

When the program ends, naturally, the program, and any space that it has tied up, will disappear and the total space available will expand back to it’s maximum size. But what happens if the program never ends?

--ADVERTISEMENT--

I’ve written a few of these such programs in my time. Works of beauty they were, and I was always pleased when everyone else in the shop noticed that I had created one. There’s nothing that points out your capabilities quite as much as bringing a big ol’ piece of IBM iron to a stand-still all by yourself, while from the surrounding cubicles one person after another says loudly, “hey, is there something wrong with the system?” The trick is to chime in second or third so to deflect attention from yourself.

But some programs are even meant to run forever, like daemons and other such things. And as they run, the amount of debris they generate can potentially keep growing. If the locked up resources are substantial, then it can have a real negative impact on your system.

As a result, every language must have a way of clearing out orphaned resources, making them available to other users and ensuring that the total available system space remains constant. Fortunately, PHP has a three tiered approach to garbage removal.

First Level – End of Scope

First, like most languages, whenever a scope of action ends, everything within that scope of action is destroyed, and any allocated resources are released. The scope of action can cover a function, a script, a session, etc. and when that scope ends, so does everything it is holding on to. Of course, you can always free up a resource any time you want by using the unset() function.

This is one reason why functions and methods are so very important, because they establish a scope of action, when particular memory usage begins and when it should end, and limits how long things can be around. They should be used whenever possible instead of global entities.

Second Level – Reference Counting

Second, like most scripting languages, PHP keeps track of how many entities are using a given variable using a technique called reference counting.

When a variable is created in a PHP script, PHP creates a little ‘container’ called a zval that consists of the value assigned to that variable plus two other pieces of information: is_ref and refcount. The zval containers are kept in a table where there is one table per scope of action (script, function, method, whatever).

is_ref is a simple true/false value that indicates if the variable is part of a reference set, thus helping PHP to tell if this is a simple variable or a reference.

The refcount is more interesting in that it holds a numeric value indicating how many different variables are using this value. That is, if you define variable $dave = 6, the refcount will be set to 1. If I then say $programmer = $dave, the refcount will be incremented to 2. PHP knows enough not to create a second zval for the value 6; it just updates the counter on the already existing value container. When the program ends, or when we leave the scope of the function, or when unset() is used, then this refcount will be decremented. When the refcount hits zero, the zval is destroyed and any memory that it was holding is now free.

Of course, this is a simple example for a simple variable. When you are talking about arrays or objects then it’s much more complicated for with multiple zrefs being created for the multiple values for an element in an array, but the basic processing is the same.

A problem occurs, however, if we use an array within another array, something that happens with some frequency in more complicated PHP scripts. In this case, the refcount for an array value is set to 1 when the original array value is set, then incremented to 2 when the array is associated with another array. If the scope of use of the second array then ends, then the refcount is decremented by 1. We are now in a situation where the value itself is no longer associated with anything, but the container (zval) that represents it still has a refcount greater than zero.

The end result is that the storage represented by the original array will not be freed up and that amount of memory is now unavailable for use by anything. Normally, we think of this amount of lost storage as being small, but often it isn’t. Arrays can be very big things today and it is especially problematic if the script in which this occurs is a daemon or other nearly continuously running function. In this case, the resultant ‘memory leak’ can have devastating consequences on performance and even the ability of a server to operate.

Third Level – Formal Garbage Collection

Obviously, reference count oriented clears have their limitations but fortunately, PHP 5.3 offered another option to help with this situation.

The specific situation that we want our garbage cycle to address is the case where the zval has been decremented, but it is still a non-zero value. Basically the cycle sees which values can be decremented further and then free up the ones that go to zero.

What really happens is that PHP keeps track of the all root containers (zvals). This is done whether garbage collection is turned on not (because it is faster for it to just do it rather than asking if garbage collection is on, yada, yada, yada). This root buffer holds up to 10,000 roots (fixed size, but this can be changed). When it fills up, then the garbage collection mechanism will kick off and it will begin analyzing this buffer.

The first thing the GC routine does is rip through the root buffer and decrement all of the zval counts by 1. As it does this, it marks each one with a little like check mark so that it only decrements a root once.

Then, it goes through again and marks (this time with a little squiggly line) all of the zvals whose reduced counts are zero. The ones that are not zero are incremented so that they resume their original values.

Finally, it will roll through there one more time, clearing out the non-zero zvals from the buffer, and freeing up the storage for the ones with a zero refcount.

Garbage collection is always turned on in PHP, but you can turn it off in the php.ini file with the directive zend.enable_gc. Or, you can do it within your script by calling the gc_enable() and gc_disable() functions.

As noted above, the garbage collection, if enabled, runs when the root is full, but you can override this and run the collection when you feel like it with the gc_collect_cycles() function. And, you can modify the size of the root buffer with the gc_root_buffer_max_entries value in the zend/zend_gc.c value in the PHP source code.

All in all, this allows you to control whether GC runs and when and were it does, which is a good thing because it is a bit resource intensive and so might not be the sort of thing you run just for the heck of it.

When Should You Use It

Because there is a performance hit attached to garbage collection, it is worth taking a minute to figure out when it should be used.

First, keep in mind that unless you overtly run it (with the gc_collect_cycles() function), the formal garbage collection will not happen until the root table (10,000 entries) is full, and since this table is at the scope level, that isn’t going to happen for small functions.

Should you use it on small scripts? That’s up to you. It’s hard to say that running something like garbage collection is a bad thing, but if you have small, quick running scripts that start and then end and are gone then there might not be much of a payback. But if your server is running a lot of small scripts that stay persistent, then it will probably be worth the effort. The only real way to know is to benchmark your application and see. And certainly, if you have long running scripts or especially scripts that do not end, then garbage collection is essential if you want to prevent the kind of memory leaking that we talked about above.

Perhaps most importantly, we should always try to follow good programming guidelines so that we minimize or eliminate global variables and tie our variables instead to scope, so that even if we have a long running script, we free up that memory when the function, rather than the script, ends. Also be aware of when you are using arrays within arrays, or objects referencing objects, since such situations can cause memory leaking and is the real target of the formal garbage collection process.

Dave Shirey is owner of Shirey Consulting Services which provides business and technical project management, EDI set up and support, and various technical services for the mainframe, midrange, and desktop worlds. In addition to PHP Master, he also writes frequently for MC Press and Pro Developer (formerly System iNews). In his free time, he tends to worry vaguely about the future and wonder obsessively how different his life might have been if he had just finished working on that first novel when he got out of college.