I’m a nobody, nobody is perfect, therefore I’m perfect.

Careful with Naming in Your Platform — Part 2

I’ve written before on this blog about the importance of naming “things” in a tech platform/framework and I feel this deserves a bit more attention, so here is a follow-up.

Previous post saw me discussing naming class member data as well as database fields — these can wreak havoc amongst your users (be them developers or simple customers running reports). Not to mention that it can end up confusing even your own team further down the lines, when the memory of the reasons for choosing a name has faded away!

That’s not the only place where naming can hit you! Naming components in your system wrongly can have disastrous effects. (And by components I mean anything from a simple class or collection of classes, a library in your framework to actual standalone server software components running in your infrastructure.)

One such case that springs to mind is a company which had deep in their framework a component referred to as “email validator” — this was a (small, to be honest) library which had a very simple interface to the outside: you supply a string containing a potential email address and it came back with true or false, indicating whether the string contained a valid email address. At least that’s what everyone thought!

This component, as it turns out, was used a lot throughout their website registration and login validation and a few other places. It was written a while back, following a DDoS attack where they were forced to increase a lot of their security and validation to prevent spam accounts being created and at the time it was released it did well enough to shield them from a lot of spammers that it then started being used everywhere they needed an email validation.

New members started in their teams and new components were added to the stack using this service totally thinking that the component indeed ensured that the email address is valid — even to the point where some developers actually thought that this triggers in the backend an email to be sent to that address and verify it’s properly valid and doesn’t bounce back. (Far from it as it turns out!) The component ended up so buried deep inside their platform and so widely used that no one had actually bothered to have a second look at the sources and they all assumed it does what it says on the tin — and the “tin” said “email validator”, or at least that was the name everyone knew this component under.

In time though they started noticing their email servers beginning to struggle and a huuuuuge queue of pending emails was occasionally observed. This together with high disk activity triggered a whole investigation into it which revealed that ever so often the email server will be swamped with trying to send emails to a lot of invalid email addresses — it will occasionally even ping-pong with some domain email servers which kept on rejecting such emails, creating longer and longer sets of automatic replies/discard messages, which then get logged on disk as well as generating more emails to a postmaster address which no one was checking. In brief, the server occasionally was totally swamped trying to keep up with network and disk activity for these wrong email addresses.

A look at these wrong email addresses revealed such “email addresses” as: a@b.c.d, x@y, me@my.address and whole bunch of email addresses which looked valid at first glance, but on a second look it turns out they contained mispellings (e.g. markt@cognitivematch.com worked but mark@cognitivematch.com failed.) This is when someone said “That’s impossible! Our email validator would reject those email addresses, the error must be somewhere else.” And luckily this is also when someone actually said: “Well, I’m not sure to be honest — we haven’t looked at that email validator code in over a year now.”

And off they went (thankfully!) to investigate the “email validator” code. And what do you know? It turns out this email validator code was a much (muuuuch!) simpler piece of code than what they thought: it actually only ran a string through a bunch of regular expressions to check only if the string “has the structure of an email address” (quote from some comment inside the code). In other words it checked that:

the string is not empty or just filled up with white spaces only

it doesn’t contain any “funky” characters — and by “funky” I mean it does not contain Cyrillic, chinese or other (considered) unnacceptable characters

it has an “@” symbol inside it — and a few characters before that (username) and a few others (domain name)

it also checked that the domain name contains at least a dot in it

This, according to the developer, resembled the structure of an email address, and quite likely, at the time of the DDoS attack, it was good enough to stop a lot of the spam accounts. However, this code did not check whether the email address is valid, whether emails sent to that address go through or not — it didn’t even in fact check whether the domain name is valid!

However, the name given internally to the component (“email validator”) lead everyone to believe it does more than it was — even more so, the name, coupled with the immediate success in stopping spam accounts when this went live, had everyone convinced that this component not only absolutely ensures it’s a valid email address but also makes coffee, massages your back and acts as your personal assistant 🙂

Needless to say, after its true functionality was revealed, this component was (rightly so) re-factored and incorporated in a bigger component labelled as “basic checks” — this was used throughout the login and registration and ensured simple checks were performed such that no empty strings, spaces or other such oddities can be specified in input strings — including email addresses. However, the exercise cost the company a few good days of work from both sysops as well as developers — and a lot of hours in development work needed to change all the other components using the original “email validator” component to ensure a more thorough check of email addresses supplied. All of which could have been avoided (or at least lowered drastically) if the name chosen was not that misleading.

On the opposite spectrum of this is a component named “rules engine manager server service” — can anyone attempt to guess what that stood for? 🙂 As it turns out, the answer is simple: it’s actually a standalone (server) application which is meant to monitor that another component (rules engine) is properly managed by the “rules engine manager” component and doesn’t leaks errors in executing these rules. And it did this by monitoring a few managed beans through JMX as well as checking some log file entries. But the name was so inappropriately chosen that a lot of members of the teams had no idea what that component did and ended up duplicating a lot of that code in separate components which then ended up being called things like “rules error counter” (as it did kept count of any errors raised by the execution of such rules when executed) or “rules life cycle monitor” (since it did collect a lot of information that the execution of rules will generate throughout their life cycle).

If the initial name was chosen to be more “user friendly” while still reflect what the service did, then the code duplication would not occur. Not to mention that the company wouldn’t be dealing with hardware duplication too — since the new components needed their own hardware to run on, so they ended up with 2 separate services running in parallel on separate hardware but doing the same thing.

One last story comes to mind from a friend’s company, where they started (like most startups) with a working proof of concept, enough to get them a couple of beta customers to validate their product and allow for development of a “proper” product. It was at this stage when a lot of refactoring occurred in the existing code — which is to be expected, by the way, when switching from “proof of concept” to “product”! The engineers took the right approach to isolate components and refactor and replace them one by one, to minimize risk and downtime.

Part of minimizing downtime, the engineers realised that it was necessary with one of the components which they were replacing to be running for a while in parallel with the new version and switch traffic gradually from the old version to the new one. Maybe because of this reason, they decided to use “new” in the name of the component which they rolled out in production (in parallel to the old one). As such, they ended up with 2 components doing the same thing in production:

“content cache” — or “old content cache” as they referred to it

“new content cache” — which would end up ultimately replacing entirely the old content cache

The procedure to replace the old content cache with the new one took them about a month and by the end of this month they switched off the old content cache servers happy with the outcome of the new one (which as it turns out it was performing much, much better than the old one).

The problem is that when they set off to redesign this component, with the view to replace the old one, the developers bought so much in using “new” in the component name, that they ended up with a whole bunch of class names and methods containing “new” in them: NewContentItem, NewMessageQueue, NewContentMessage and so on. Pretty much the whole set of old classes (ContentItem, MessageQueue and so on) got replaced by New... classes. This — they said — made it clear to every developer working on the project, or accessing the component from the outside, that they are dealing with classes from the “new content cache” component.

As such, code written around this “new content cache” specifically invoked these classes and methods and there was no confusion with the classes used for the old system. And everyone was happy… until one day a few months later on!

What happened, as it is normal with evolution of platforms, a few months into using the “new content cache”, it became necessary again to replace this with a bigger, better, more scalable “content cache”. In other words, they needed a new “new content cache”. Now, how do you go about naming your classes and methods when you’re replacing an old component were all the names begin with “New…” but in fact now they are all old now!?!?

What followed was a messy few months during which no one really knew anymore when saying “new content cache” which one of the versions people were referring to — this confusion reflected in the code and in deployments and made a lot of people and servers unhappy 🙂 Luckily, one of the new (!!) developers in the company came with the name “content aggregator” — as they pointed out the cache is no longer a simple cache but has a lot of other functions, one of them being to aggregate content from various sources and then store it (on disk) and cache it (in memory). Having arrived at that name, everyone then understood that “content aggregator” is the “new new content cache” and that the “new content cache” is nowadays in fact old! Conclusion which could have been reached much earlier if rather than using “new content cache” the company went for something like “content store” — which then is easy to migrate to “content aggregator”!

Simple things like this make a huge difference — and end up costing lot of stress, time, money and resources. So don’t apply names in your platform randomly — it might bite you back!