Introduction

Many programming languages such as C/C++, C#, Java, and Pascal provide the switch statement to let us implement selection logic. In some scenarios, it's a good alternative to if-then-else, making code clearer and more readable. When using switch in practice, you may want to know:

How the switch block is executed at runtime?

Is it running faster than an if-then-else for a long list of conditions?

For n conditions, what is the switch time complexity?

The C/C++ standard defines the specification of language elements, but it doesn't say anything about how to implement the switch statement. Every vendor is free to use any implementation as long as it fits the standard. This article discusses what happens when running a switch statement in Visual C++, with a few examples under different conditions. We'll analyze these examples by using the Microsoft Visual Studio IDE, since it can generate the corresponding assembly listing on compiling. Thus, a general understanding of Intel (x86) assembly language is assumed. As you can see shortly, all results here are based on reverse engineering, and so the article is never a comprehensive description of switch implementations in compilers. If you are learning Assembly Language Programming, this article might be a study material to read.

Our first example is switch1.cpp, a commonly used simple block as below:

Could the worst case be thought of as i=3 or i=20? How does it execute: does it exhaust all nine cases and finally reach the default to call f3? Let's answer it from the assembly translations.

In order to generate an assembly listing in Visual Studio, open the switch1.cpp Property dialog and select the Output Files category under C/C++. On the right pane, choose the Assembly With Source Code (/FAs) option as shown below:

Then, when you compile switch1.cpp, an assembly file called switch1.asm will be generated. Using this option, the listing includes the C++ source code, which is commented by the semi-colon with a line number as shown in the next section.

The Two-Level Jump Table

Let's analyze the assembly listing from top down. Here is where the switch starts:

Assuming that the symbol _i$[ebp] is an alias of i, tv64[ebp] is another name of i2, $LN15@main is renamed as table1, $LN16@main as table2, and $LN1@main is a label named Default_Label. The snippet merely does this in pseudocode:

Here, 17 indicates the last case condition value, because the integers 1 to 18 are mapped from 0 to 17. This is why i2 is decremented to make it a zero-based integer as an index. Now, if i2 is greater than 17 (e.g., n=20), the control goes to the default. Otherwise, it goes to where table2[4*table1[i2]] is pointed.

How about when i is zero? Then i2 becomes -1. Worry about the index out of range error? No, it will never happen. Back to the assembly listing, you can see that -1 is saved as DWORD, the double word as an unsigned 4-byte integer. Hence, it must be greater than 17 and goes to default.

Let's look at two tables and see how they work together. The table1 is pretty simple, with a starting address at $LN15@main, which you can just think of as an array name.

$LN15@main:
DB0DB1DB9DB9DB2DB9DB3DB9DB9DB4DB5DB6DB9DB9DB9DB9DB7DB8

With this array, table1[0] is 0, table1[1] is 1, table1[2] and table1[3] are 9, etc. These values are created for the calculation of the index of table2, which starts at $LN16@main:

The above labels, from $LN10@main to $LN1@main, are ten calling targets in C++, for nine cases plus one default. Notice that DB represents defining byte (8 bits), while DD defines the double word type of four bytes (32 bits). This is why we need to multiply 4 in table2[4*table1[i2]]. By this formula, we calculate the calling address via table1 and table2:

if i equals 1, i2 is 0 and table1[0] is 0, jump to $LN10@main by table2[0], the first case.

if i equals 2, i2 is 1 and table1[1] is 1, jump to $LN9@main by table2[4*1], the second case.

if i equals 3, i2 is 2 and table1[2] is 9, jump to $LN1@main by table2[4*9], the default.

... ...

Now we come to the code segment labeled by LN10@main to $LN1@main as the calling targets:

Functions f1, f2, and f3 are converted to the decorated assembly procedures, and prefixed with a question mark, like ?f1@@YAXXZ, ?f2@@YAXXZ, and ?f3@@YAXXZ. Notice that $LN10@main is a label of case 1 to call f1, $LN9@main for case 2 to call f2, and $LN1@main for default to call f3.

An additional label is $LN11@main, pointing to the location after the last default clause. As soon as each case operation is done, the control jumps to $LN11@main. This implements the break statement. Without it, the control goes to the next case. This is why the break statement is necessary in the C/C++ switch block.

Obviously, based on such a two-level table mechanism, we have one comparison, one multiplication, and two address jumps. The time complexity of this switch pattern should be O(1). Put this all together, we have a big picture like this:

Recall that we use DB to define index values in table1, and DD for table2. In this example, there are only 18 case values for table1. But DB defining byte means the range is only from zero to 255. What if there are more cases in switch, with the number of values over 255?

The One-Level Jump Table

Here comes the second example, switch2.cpp with 1000 cases. For simplicity, we start from the condition value zero:

Likewise, assume that the symbol _i$[ebp] is an alias of i. Because the first case starts from zero, no mapping is needed. So i2 (from tv64[ebp]) equals i. The array $LN1006@main2 is an only jump table, and the label $LN1@main2 is Default_Label. This snippet simply does this:

i2 = i;
if i2 > 999 goto Default_Label;
goto table[4*i2];

By searching $LN1006@main2 in switch2.asm, we find the code labels all defined by double words:

Here, $LN1@main2 directs to default, and the additional label $LN1002@main implements break. Although the complexity is also O(1), this mechanism should be a little bit efficient since only one address jump is required. The following is a picture of this example:

Until now, we see the switch illustrations executed so well. Is it really better to use switch than if-then-else? For n conditions without a match, definitely, if-then-else must check each condition until the last, which is the worst case of O(n). But for switch, do we always expect its complexity to be O(1)? Or is there some kind of code that would lead its execution beyond that?

Using Binary Search

We will give the third example showing the large gap between case condition values in switch3.cpp, where the switch execution behaves as the binary search:

The logic is not too hard to understand. At first, the condition value i is saved in both _i$[ebp] and tv64[ebp]. To simplify, we just make use of all the labels here by removing the leading "$" and the tailing "@main3". Rewrite the snippet like this:

Surely, the compiler optimizes the code, because it first chooses 700 to compare, instead of the beginning case value 100 in the switch block. Although this switch is implemented with ten if-then statements for nine conditions, it actually applies the binary search mechanism. The following shows the equivalent decision tree of above code, a Binary Search Tree that you may be familiar with:

For all internal nodes colored yellow in circle, we make the comparisons, while all leave nodes in oval represent the successful ending. Each failure goes to LN1 by default. The inorder traversal sequence for comparison nodes is 100, 200, 250, 500, 700, 750, 800, and 900; while the sequence for all leaves in the inorder traversal is exactly LN9, LN8, LN7, LN6, LN5, LN4, LN3, and LN2, as ordered in the previous table. Essentially, this is a binary search algorithm with the complexity of O(log n). When i=1, it will pass through the first six comparisons and then reach default at LN1. But when i=500, just four comparisons are required.

As we know, the prerequisite of binary search is that the input data is sorted. I wonder if this is just because I use the ascending case values in switch in switch3.cpp. The curiosity brings out the following disordered conditions in switch4.cpp.

To my surprise, the same binary search strategy appears in code of switch4.asm, with exact the same decision tree as shown above. Only difference is that the labels are renumbered - that's quite reasonable, because we have just reordered them! You can examine the attached switch4.asm for details.

This experiment definitely brings us some hint to find out how the compiler does its magic to sort the case condition values. A sorting algorithm is over O(log n) and not worthwhile to have it at runtime. Notice that sorting is not visible from generated assemblies. This means that sorting is not contained in the assembly instructions which will execute at runtime. Also notice that all the condition values are constant and available before the compiler translates the C++ to machine code. Now it’s reasonable to think of the preprocessor; the sorting with all known case values could be simply done in compilation. This is why the translated assembly code only reflects the binary search without sorting code. The static sorting behavior (as opposed to dynamic behavior at runtime), could be implemented with a macro procedure, assembly directives and operators. Such an preprocessing example can be found in References at the end of the article.

The Hybrid

At this moment, I can show an example of combination of a jump table and the binary search as in switch5.cpp.

It consists of two parts, a one-level jump table first and then the binary search code. Using the previous notations of i, i2 and label naming, let $LN18@main5 as a table and $LN1@main5 as default. This snippet does this:

An interesting change is to remove the case 6 in switch5.cpp. Just because of this, the hybrid becomes the combination of a two-level jump table and the binary search. For details, look at switch6.cpp and switch6.asm in downloadable samples. If you try to add extra cases in some order, you can find two individual jump tables combined with the binary search.

More Questions and Beyond

Now you have learned something about switch from some typical examples. You should understand now why the C/C++ switch only supports an integral data type for its conditional expression, such as char and enumeration type, rather than the float point or string type. Besides these, more intuitive questions might come to your head:

Is it necessary to maintain the cases in the order of condition values for a jump table?

How about if we use negative integers as case values?

What if the default label is missing, or appears anywhere in the block not at the last?

I believe you can answer these just by analyzing an assembly listing of the C++ code containing these questions. For convenience, I attached the following switch7.cpp with its switch7.asm in samples.

Although the switch performance looks better than if-than-else in these examples, we still have questions unanswered. Obviously, it's not possible to enumerate all the switch executions in practice. A comprehensive analysis of switch implementations should be written by a compiler developer, not by a blind black-box tester. So, we are not able to determine the worst case in the switch execution, whether it would reach O(n) or not. Also we don't know under which conditions, the compiler chooses one implementation instead of another or even other methods unmentioned here.

As an example from a reader in discussion, one concern is about the memory waste of above said jump tables when implementing sparsely populated switch cases. How about an extreme example with only two case entries 1 and 0x7fffffff as below? Would the compiler consume most useless entries in a jump table?

This is not true for our smart compiler.
As mentioned, we don't know how the compiler chooses
one implementation instead of the other. Please see here, this
two-case example doesn't choose the jump table. It just simply converts switch to if-then without consuming any more memory:

Again, no way to affect or predict the compiler's choices with such a black box testing. Depending on the problem, either sparsely or densely populated switch cases, the compiler adapts them obviously very well.

Summary

By scrutinizing the above examples, we expose something that you may not know about the switch at runtime. To analyze a C/C++ program in Visual Studio, we can have both static and dynamic analyses. For this article, we make use of the assembly listings generated by the compiler. On the other hand, we can also monitor binary executions at runtime with the VS Disassembly window in debug, where the labels and procedures in the listing are translated to memory addresses. This way, you can trace registers and memory allocations to understand data endianness, stack frames, etc. Here, for our purpose, the assembly listing seems sufficient and easy to indicate a jump from here to there. The downloadable zip file contains seven examples, with both the .cpp source code and the .asm listings in VS 2010. The same listings can be generated by an earlier version of VS.

This article does not involve hot technologies like .NET or C#. Assembly Language is comparatively traditional without much sensation these days. However, in applications, different types of assembly languages play an important role in new device development. Academically, Assembly Language Programming is considered as a demanding course in Computer Science. I am teaching Assembly Language Programming, CSCI 241, at Fullerton College, where the concept of high-level language interfacing with low-level execution is an essential part in teaching. In particular, one of the main topics that students are interested in is how C/C++ or Java runs on a machine, represented by low-level data structures and algorithms. Therefore, I hope this article could also serve as one of the basic problem-solving examples for students. Any comments and suggestions are welcome.

Acknowledgment

I would like to sincerely thank wtwhite, Charvak Karpe, and Harold Aptroot for their valuable discussions about the binary search here. Without them, the section "Using Binary Search" would not be like it as today.