How to debug an infinite recursion

This is a little post for programmers. I'm sharing this because I think it's an incredibly useful tip that I use regularly, but I've never seen anyone else do it before.

One of the most difficult problems to debug is some kind of 'infinite recursion' in code. It's easy for conditions to accidentally be created for a cyclic dependency, which makes a pattern of code repeat itself to infinity. i.e. A requires B which requires C which requires A. Well, actually it repeats until PHP runs out of memory, or crashes, or something else nasty.

This can particularly happen in a big flexible system like ocPortal because the programmer isn't creating a website directly, they're creating an engine, i.e. all kinds of unexpected and untestable usage might happen and it's easier for the programmer to make a mistake.

Particularly it also can happen during startup, because interdependencies in this situation are very common. As a simple example, imagine what happens if the primary language file is corrupt, causing ocPortal to flag up an error, causing ocPortal's error handling system to load, which in turn tries to display an error in the current user's language. It's worse than just errors about language needing to be displayed using language – there are other outstanding questions such as who is the current user, and what language have they got set, or what is the default site language? As language probably loads before the database connection, the configuration, and the user system, naive code could easily go off at a dangerous infinite tangent. Therefore really core systems are designed to have various fallbacks, and to be able to recognise when those fallbacks are needed.

But it's very easy for us to make a mistake, and hard for us to guard against all conditions the first time we write code (there's too much to test, and too much to surprise us).

Diagnosing these issues is really hard. At the point of the error, ocPortal has completely crashed. It's a small wonder that we're able to show even limited stack traces in these events – to do just that we use a number of PHP tricks not many people know about, to force the basic information out even in fatal circumstances.

The stack trace for a cyclic dependency event isn't very useful either, because it can't go back enough, and it very rarely gives enough information about context.

The naive way to debug it (which I think most people do) is to step through the code. This sometimes works, but it can take hours or days. It doesn't always work though because often a problem happens in a function but not the first time that function is called, so you can't just trace through directly. Plus, you really don't want to be doing something so protracted and stressful on a live server.

So how to solve it? It's very very simple, just go to where the stack trace says the error happened (which will be some spot in the cyclic dependency, but not the actual trigger). And add some code like…

Code

And refresh. You will then get a proper stack trace, showing a much more useful context.

It works very simply. The IP address check is there just to stop regular users seeing the stack trace, and disrupting them (just because you have an error in some context, the code you're debugging is probably actually used successfully for working parts of the site too). The rand check is where the magic is – on average it will let the code pass 250 times, and then it will give a stack trace. If it's run 250 times you know something is almost certainly very wrong, but it's not too late to generate a stack trace (that will probably be after 1000's of times). We could try and count our way through instead of using randomness, but a bit of statistical manipulation is quicker and easier than trying to create a proper re-entrant counter.