The Bug

Recently, I went on my own bug hunt. I’d seen some strange behavior in our
PHPUnit test suite. The tests were passing when run normally, but certain tests
would consistently fail with process isolation enabled. The behavior was similar
to what is shown below, which is a from a simple test suite I made to reproduce
the problem:

Uh-oh. That’s not good. I mean, the failure alone is bad enough, but the fact
that it only shows up with process isolation turned on is particularly
troubling. Process isolation starts a new PHP process for each unit test. If
tests are passing without process isolation and failing with process
isolation, it implies that the tests are changing global state. These type of
bugs are notoriously hard to track down.

The Search Begins

I started by simply debugging one of the failing tests. And I found rather
quickly that the error (shown with process isolation enabled) was correct. Very
simply, a function was being called with incorrect parameters. Now, that’s easy
to fix, but I didn’t want to fix it yet. Before I could fix it, I had to
understand why it appeared to pass when run as part of a larger test suite.
A test suite that hides failures is a really scary thing.

The scale of the problem I faced was overwhelming. The results shown above are
from a simple test suite designed to reproduce the problem. The suite where I
originally discovered the problem had more than 1000 different tests, more than
20 of which would fail with process isolation enabled. Figuring out which one of
the 1000+ tests was affecting the global state was a daunting task (because it
could literally be any of them).

Needle in a Haystack

My first step was to reduce the size of the problem. Disable as many test cases
as possible while still producing the behavior I originally saw. Basically, I
turned off different parts of the test suite until I figured out exactly which
combination of tests was causing the problem. This is a bit strange - imagine, I have a
test suite with more than 1000 tests that should be producing roughly 20 errors, but isn’t.
And I’m turning off entire chunks of the test suite, hoping that I still don’t
see any errors even though I should. Eventually, I was able to reduce the problem to 3
tests.

One of the tests was the test that is supposed to fail. The other two tests were
required to run before the failing test to reproduce the behavior that hides the
error. If I turned off either of the other two tests, the error would be shown.
These two tests were somehow affecting the global state of the PHP process. I
created a
gist on
Github that shows my own (simplified) reproduction of the problem.

Finding the Bug

At this point, although I knew which tests were causing the problem, I still
didn’t know why. In the codebase where I first discovered the problem, the
problematic function call was buried 15 layers down the call stack. To find it,
I took the same approach that I took to finding the 3 problematic tests. Comment
out large chunks of code ’til something changes. If I comment out a large chunk
of code and the error shows up because of it, I know that something in that
chunk is causing the problem. I didn’t care if I broke any of the other tests
along the way - I just needed to find out which line of code was preventing
PHPUnit from showing the correct error message.

After a bit of searching, I found it. And after finding it, it made perfect
sense. Our code was calling set_error_handler() to do some custom logging with
errors. In a production environment, our custom error handler would be
configured to send error messages to certain places, or act on them in other
ways. In PHPUnit, none of this config was loaded. So it would eat error messages
without reporting them anywhere.

The Solution

If I was writing all of this from scratch, I’d probably solve this problem with
dependency injection. In production, set_error_handler() would actually be
called. And in unit tests, I’d use a mock that would never actually call the
set_error_handler() function. But I wasn’t writing this from scratch. This
error handler was in a very old part of a very large codebase. I needed to be
sure that set_error_handler() would be called 100% of the time in
production. And, for fear of running into the same problem again, I needed to be
sure that it would never be called inside a unit test. The solution I came up
with is ugly. But it always works.

Lessons Learned

The biggest lesson I learned from this experience is really just reinforcing
something I already knew. PHP has several functions like ini_set(),
set_error_handler(), etc. that can affect the global state of a PHP process.
If you can’t avoid using these functions entirely, at least isolate them in your
code base so they’re not inter-mingled with other logic in your code.

Run your PHPUnit test suite with --process-isolation occasionally. It’s
slower, but it might show you problems you didn’t otherwise know about.