ExpressionEngine and the Mystery of 00o93H7pQ09L8X1t49cHY01Z5j4TT91fGfr

Recently, I came across a weird issue while developing an ExpressionEngine 2.0 module. The issue was that, for some reason, my module would output a random string instead of the intended content. For example, instead of displaying the HTML, that the module created, the module would display the string /2011/11/acm-interactions/ instead. Tracking down this issue was symptomatic of the proverbial needle in a haystack but I learned a lot about how ExpressionEngine's internals work and was once again reminded to be pragmatic about my development strategies.

My first and automatic reaction when coming up against some random and meaningless output string is to just Google it and see what happens. Unfortunately though, this particular search, at the time at least, returned around 35,000 sites that were outputting the string and only 1 that had a reference to the issue with a possible explanation. All of a a sudden this became an actual thing.

Putting aside the fact that there are around 35,000 sites with the string within Google for the moment (and that's really a problem to be discussed) it took a bunch of digging but I finally found a few things:

First, this is an issue that has been within ExpressionEngine for years. Doing a search on the ExpressionEngine site lists the first entry of this string at 2007 (sorry for the lack of a link; ExpressionEngine search is cached so old search URLs don't work sometimes). My initial reaction was that this was kind of sloppy; EllisLab updated ExpressionEngine to 2.0 recently, from the ground up to use CodeIgniter I believe, yet this issue got ported over?

After doing further research though, I came to believe the above is more to do with the design of ExpressionEngine than anything else like copying code. Turns out, there's some logic to that random and oblique string.

Basically, that string (/2011/11/acm-interactions/) is a template holder for the content you want to output. It's a little complicated in how it works but thanks to Derek Jones over at the ExpressionEngine forums there's a succinct, though still a little complicated but less complicated than I could explain it, explanation of what's going on.

...

The template parser is an extremely complex gizmo, so let me try to explain it stripped of what techno mumbo jumbo I can. This will still be long, so bear with me.

The template parser intelligently saves on resources by only processing a given tag once, no matter how many times it occurs on the template. This means that if your tag is exactly the same, including its tagdata, EE only execute’s that modules code once, and replaces all copies of that tag with the output.

So first, when the template parser goes through the template finding tags, it replaces them all with temporary markers (the gibberish), and adds each tag to a list of tags to be parsed, in the order they are encountered on the template, top down. Each marker starts with the letter “M” followed by the number (starting at 0) that tag is sitting in on The List.

Second, the parser goes down its list of tags. It parses tag #0, and then replaces all of #0’s marker with that tag’s output.

——————-

Ok that’s the setup. Here’s why it affected you, and why this side effect is not a problem (and also extremely, extremely rare). Remember, top down, tags will be assigned 0, 1, 2, etc. So I’m abstracting it out to show the nesting and the positioning only. Just focus on the numbers.

Step 1, the tags are grabbed and replaced with markers. The first tag encountered is replaced first.

M0
&#123;exp:tag1&#125;
bar
M0
&#123;/exp:tag1&#125;

And then the next tag:

M0
M1

All tags have been collected and put on The List, so step 2, the parser goes down The List, and parses tag #0:

foo
M1

See the problem? The second M0 is sitting protected inside the marker for tag #1, so it doesn’t get replaced. Then tag #1 gets parsed.

foo
bar
M0

And the marker is left in the output. EE doesn’t go back and parse tag #0 again, it’s already done that once, and will not do it again.

So the circumstances required for this problem to exist each are uncommon, and combined, very very rare:

1) Multiple exact copies of a given tag
2) One or more copies nested inside another tag (tag nesting itself is also rare, and usually not recommended)
3) The tag of which there are copies must sit at a position in the template above any occurrences of it being nested in another tag.

From the above it was easy to conclude that the template parsing stuff was breaking down in my module. Why, I had no idea, but at least I had a known starting point. While it was easy to conclude that my issue lay with the template parsing though, it was 100% off point for my particular issue which had to do with, of all things, the case of the module name. But, I digress.

The point is that for years now, and through multiple versions and rewrites, ExpressionEngine has consistently carried the same "issue". Understanding the intent of the design for the template parsing helps put some perspective on things: it's definitely a side effect of the particular approach. It bugged, and surprised, me that ExpressionEngine would have such oblique and useless output on such a widespread basis (35,000 site's with that string within Google is a bit much in my opinion) but the more I thought about it the more I came to understand the complexity of what was involved in being helpful when the issue cropped up. Aside from abandoning the approach what could really be done?

It's a nice reminder that development is often a balancing act between getting what you want and dealing with a counter issue. The ExpressionEngine team developed a strategy to enable them to effectively and efficiently meet their requirements but there's sometimes an issue; a random string gets output instead of the template data. Plus, since the cause could be for anything from bad module installation data (as I had) to an issue within a template there's no real way to provide helpful error messages when the template parsing does fail. But, on the other hand, the template parser does exactly what it was designed to do (and it does it well).