As an experienced software developer, I have learned to avoid magic strings.

My problem is that it is such a long time since I have used them, I've forgotten most of the reasons why. As a result, I'm having trouble explaining why they're a problem to my less experienced colleagues.

What objective reasons are there for avoiding them? What problems do they cause?

@Laiv: They're similar to magic numbers, yes.I like the definition at deviq.com/magic-strings: "Magic strings are string values that are specified directly within application code that have an impact on the application’s behavior.". (The definition at en.wikipedia.org/wiki/Magic_string isn't what I have in mind at all)
– KramiiFeb 5 '18 at 10:20

17

this is funny I have learned to detest... later What arguments can I use to persuade my juniors... The never end story :-). I wouldn't try to "convince" I would rather let'em learn on their own. Nothing last more than a lesson/idea reached by your own experience. What you are trying to do is indoctrinate. Don't do that unless you want a team of Lemmings.
– LaivFeb 5 '18 at 10:32

15

@Laiv: I'd love to let people learn from their own experience, but unfortunately that isn't an option for me. I work for a publicly funded hospital where subtle bugs can compromise patient care, and where we can't afford avoidable maintenance costs.
– KramiiFeb 5 '18 at 11:29

6

@DavidArno, that is exactly what he's doing by asking this question.
– user56834Feb 5 '18 at 12:40

7 Answers
7

In a language that compiles, a magic string's value is not checked at compile time. If the string must match a particular pattern, you have to run the program to guarantee it fits that pattern. If you used something such as an enum, the value is at least valid at compile-time, even if it might be the wrong value.

If a magic string is being written in multiple places you have to change all of them without any safety (such as compile-time error). This can be countered by only declaring it in one place and reusing the variable, though.

Typos can become serious bugs. If you have a function:

func(string foo) {
if (foo == "bar") {
// do something
}
}

and someone accidentally types:

func("barr");

This is worse the rarer or more complex the string is, especially if you have programmers that are unfamiliar with the project's native language.

Magic strings are rarely self-documenting. If you see one string, that tells you nothing of what else the string could / should be. You will probably have to look into the implementation to be sure you've picked the right string.

That sort of implementation is leaky, needing either external documentation or access to the code to understand what should be written, especially since it has to be character-perfect (as in point 3).

Short of "find string" functions in IDEs, there are a small number of tools that support the pattern.

You may coincidentally use the same magic string in two places, when really they are different things, so if you did a Find & Replace, and changed both, one of them could break while the other worked.

Regarding the first argument: TypeScript is a compiled language that can typecheck string literals. This also invalidates argument two to four. Therefore, not strings itself are the problem, but using a type that allows too many values. The same reasoning can be applied to using magic integers for enumerations.
– YoguFeb 5 '18 at 12:47

11

Since I have no experience with TypeScript I will defer to your judgement there. What I would say then, is that unchecked-strings (as is the case for all languages I've used) are the problem.
– Erdrik IronroseFeb 5 '18 at 13:14

23

@Yogu Typescript won’t rename all your strings for you if you change the static string literal type you’re expecting. You’ll get compile time errors to help you find all of them, but that’s only a partial improvement on 2. Not saying it’s anything less than absolutely amazing (because it is that, and I love the feature), but it definitely does not outright eliminate the advantage of enums. In our project, when to use enums and when not to remains a kind of open style question that we aren’t sure about; both approaches have annoyances and advantages.
– KRyanFeb 5 '18 at 16:25

30

One big one I've seen not for strings as much as numbers, but could happen with strings, is when you have two magic values with the same value. Then one of them changes. Now you're going through code changing the old value to the new value, which is work on its own, but you're also doing EXTRA work to make sure you're not changing the wrong values. With constant variables, not only do you not have to go through it manually, but you don't worry that you've changed the wrong thing.
– corsiKaFeb 5 '18 at 16:47

35

@Yogu I would further argue that if a string literal's value is being checked at compile time, then it ceases to be a magic string. At that point it's just a normal const/enum value that happens to be written in a funny way. Given that perspective, I would actually argue that your comment actually supports Erdrik's points, rather than refuting them.
– GrandOpenerFeb 6 '18 at 6:52

The summit of what the other answers have grasped at, is not that "magic values" are bad, but that they ought to be:

defined recognisably as constants;

defined only once within their entire domain of use (if architecturally possible);

defined together if they form a set of constants that are somehow related;

defined at an appropriate level of generality in the application in which they are used; and

defined in such a way as to limit their use in inappropriate contexts (e.g. amenable to type checking).

What typically distinguishes acceptable "constants" from "magic values" is some violation of one or more of these rules.

Used well, constants simply allow us to express certain axioms of our code.

Which brings me to a final point, that an excessive use of constants (and therefore an excessive number of assumptions or constraints expressed in terms of values), even if it otherwise complies with the criteria above (but especially if it deviates from them), may imply that the solution being devised is not sufficiently general or well-structured (and therefore we're not really talking about the pros and cons of constants anymore, but about the pros and cons of well-structured code).

High-level languages have constructs for patterns in lower-level languages which would have to employ constants. The same patterns can also be used in the higher-level language, but ought not to be.

But that may be an expert judgment based on an impression of all the circumstances and what a solution ought to look like, and exactly how that judgment will be justified will depend heavily on the context. Indeed it may not be justifiable in terms of any general principle, except to assert "I am old enough to have already seen this kind of work, with which I am familiar, done better"!

EDIT: having accepted one edit, rejected another, and having now performed my own edit, may I now consider the formatting and punctuation style of my list of rules to be settled once and for all haha!

I like this answer. After all "struct" (and every other reserved word) is a magic string to the C compiler. There are good and bad ways of coding for them.
– Alfred ArmstrongFeb 5 '18 at 16:08

6

As an example, if someone sees “X := 898755167*Z” in your code, they probably won’t know what it means, and even less likely to know that it’s wrong. But if they see “Speed_of_Light : constant Integer := 299792456” someone will look it up and suggest the correct value (and maybe even a better data type).
– WGroleauFeb 5 '18 at 16:14

26

Some people miss the point completely and write COMMA = "," instead of SEPARATOR = ",". The former doesn't make anything clearer, while the latter states the intended usage and allows you to change the separator later in a single place.
– marcusFeb 6 '18 at 13:54

1

@marcus, indeed! There is of course a case for using simple literal values in-place - for example, if a method divides a value by two, it may be clearer and simpler to simply write value / 2, rather than value / VALUE_DIVISOR with the latter defined as 2 elsewhere. If you intended to generalise a method handling CSVs, you'd probably want the separator passed in as a parameter, and not defined as a constant at all. But it's all a question of judgment in the context - @WGroleau's example of the SPEED_OF_LIGHT is something you'd want to explicitly name, but not every literal needs this.
– SteveFeb 6 '18 at 17:27

4

The top answer is better than this answer if need convincing that magic strings are a "bad thing." This answer is better if you know and accept that they're a "bad thing" and need to find the best way to meet the needs that they serve in a maintainable way.
– corsiKaFeb 7 '18 at 15:48

@ThomasMoors No he's talking about the way you have to build a new string every time you want to use an already existing magic string, point 2 is about changing the string itself
– Pierre ArlaudFeb 6 '18 at 10:29

Real life example: I am working with a third party system wherein "entities" are stored with "fields". Basically an EAV system. As it is fairly easy to add another field, you get access to one by using the field's name as string:

Field nameField = myEntity.GetField("ProductName");

(note the magic string "ProductName")

This can lead to several problems:

I need to refer to external documentation to know that "ProductName" even exist and its exact spelling

Plus I need to refer to that doc to see what the datatype of that field is.

Typos in this magic string will not get caught until this line of code is executed.

When someone decides to rename this field on the server (difficult while preventing dataloss, but not impossible), then I cannot easily search through my code to see where I should adjust this name.

So my solution for this was to generate constants for these names, organised by entity type. So now I can use:

Field nameField = myEntity.GetField(Model.Product.ProductName);

It is still a string constant and compiles to the exact same binary, but has several advantages:

After I have typed "Model.", my IDE shows just the available entity types, so I can select "Product" easily.

Then my IDE supplies just the fieldnames that are available for this type of entity, also selectable.

Auto-generated documentation shows what the meaning of this field is plus the datatype that is used to store its values.

Starting from the constant, my IDE can find all places where that exact constant is used (as opposed to its value)

Typos will be caught by the compiler. This also applies when a fresh model (possibly after renaming or deleting a field) is used to regenate the constants.

Next on my list: hide these constants behind generated strongly typed classes - then also the datatype is secured.

+1 you bring up a lot of good points not limited to code structure: IDE support and tooling, which can be a lifesaver in large projects
– kmdrekoFeb 7 '18 at 19:36

If some parts of your entity type is static enough that actually defining a constant name for it is worthwhile, I think it would be more apt to just define a proper data model for it so you can just do nameField = myEntity.ProductName;.
– Lie RyanFeb 7 '18 at 20:27

@LieRyan - it was much easier to generate plain constants and upgrade existing projects to use them. That said, I am working on generating static types so I can do precisely that
– Hans KeﬆingFeb 7 '18 at 20:43

Magic strings not always bad, so this might the reason you cannot come up with a blanket reason for avoiding them. (By "magic string" I assume you mean string literal as part of an expression, and not defined as a constant.)

In some particular cases, magic strings should be avoided:

The same string appears multiple times in code. This means you could have a spelling error one of the places. And it will be a hassle of the string changes. Turn the string into a constant, and you will avoid this issue.

The string may change independent of the code where it appears. Eg. if the string is text displayed to the end user it will likely change independent of any logic change. Separating such string into a separate module (or external configuration or database) will make it easier to change independently

The meaning of the string is not obvious from the context. In that case introducing a constant will make the code easier to understand.

But in some cases, "magic strings" are fine. Say you have a simple parser:

I think your definition of "magic string" is insufficient, it needs to have some concept of hiding/obscuring/making-mysterious. I wouldn't refer to the "+" and "-" in that counter-example as "magic", any more than I'd refer to the zero as magic in if (dx != 0) { grad = dy/dx; }.
– RupeFeb 5 '18 at 17:26

2

@Rupe: I agree, but the OP uses the definition "string values that are specified directly within application code that have an impact on the application’s behavior." which does not require the string to be mysterious, so this is the definition I use in the answer.
– JacquesBFeb 5 '18 at 17:38

7

With reference to your example, I have seen switch statements which replaced "+" and "-" with TOKEN_PLUS and TOKEN_MINUS. Every time I read it I felt like it was harder to read and debug because of it! Definitely a place where I agree that using simple strings is better.
– Cort AmmonFeb 5 '18 at 19:49

2

I do agree that there are times when magic strings are appropriate: avoiding them is a rule of thumb, and all rules of thumb have exceptions. Hopefully, when we are clear about why they can be a bad thing, we will be able to make intelligent choices, rather than doing things because (1) we've never understood that there can be a better way, or (2) we've been told to do things differently by a senior developer or a coding standard.
– KramiiFeb 5 '18 at 22:02

Internationalisation (i18n)

If the text to display on screen is hard-coded and buried within layers of functions, you're going to have a very difficult time providing translations of that text into other languages.

Some development environments (e.g. Qt) handle translations by lookup from a base language text string to the translated language. Magic strings can generally survive this - until you decide you want to use the same text elsewhere and get a typo. Even then, it's very hard to find which magic strings need translating when you want to add support for another language.

Some development environments (e.g. MS Visual Studio) take another approach and require all translated strings to be held within a resources database and read back for the current locale by the unique ID of that string. In this case your application with magic strings simply cannot be translated into another language without major rework. Efficient development requires all text strings to be entered into the resources database and given a unique ID when the code is first written, and i18n thereafter is relatively easy. Trying to backfill this after the fact will typically require a very large effort (and yes, I've been there!) so it's much better to do things right in the first place.

This is not a priority for everyone, but if you ever want to be able to calculate coupling/cohesion metrics on your code in an automated fashion, magic strings make this nearly impossible. A string in one place will refer to a class, method or function in another place, and there is no easy, automatic way to determine that the string is coupled to the class/method/function just by parsing the code. Only the underlying framework (Angular, e.g.) can determine that there is a linkage--and it can only do it at run-time. To obtain the coupling information yourself, your parser would have to know everything about the framework you were using, above and beyond the base language you are coding in.

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).