C++/C# bad practices: learn how to make a good code by bad example

Fun and Bugs in Microsoft Word 1.1a

The Microsoft company made a present to all programmers eager to dig into some interesting stuff: they opened the source codes of MS-DOS v 1.1, v 2.0 and Word for Windows 1.1a. The MS-DOS operating system is written in an assembler, so the analyzer cannot be applied to it. But Word is written in C. Word 1.1a’s source codes are almost 25 years old, but we still managed to analyze it. There’s no practical use of it, of course. Just for fun.

Where to find the source files

One can download the source codes of MS-DOS v 1.1, v 2.0 and Word for Windows 1.1a. Those interested in digging the source files on their own should check the original source.

Checking Word 1.1a

Figure 1. Word for Windows 1.1a.

Word for Windows 1.1a was released in 1990. Its source code was made publicly available on March 25, 2014. Word has always been a flagship product of Microsoft, and I, as well as many other programmers, was very eager to peek into the inside of the software product that so much contributed to Microsoft’s commercial success.

I decided to check Word 1.1a’s code with our tool PVS-Studio. It is a static analyzer for C/C++/C# code. That task was not that easy, of course, as the analyzer is designed to work with projects developed at least in Visual Studio 2005. And now I had C source codes that are more than 20 years old. We can fairly call them a finding from the prehistoric times. At least, the C language standard didn’t exist then and every compiler had to be by itself. Fortunately, Word 1.1a’s source codes appeared to be free of any specific nuances and non-standard compiler extensions.

Before you can perform code analysis, you need to get preprocessed files (*.i). Once you have generated them, you can use the PVS-Studio Standalone tool to run the analysis and examine the diagnostic messages. Of course, the analyzer is not designed to check 16-bit programs, but the results I got were quite enough to satisfy my curiosity. After all, a meticulous analysis of a 24 year old project just wouldn’t make any sense.

Therefore, the basic obstacle was in obtaining the preprocessed files for the source codes. I asked my colleague to find some solution, and he approached the task with much creativity: he chose to use GCC 4.8.1 to get the preprocessed files. I guess no one has ever mocked at Word 1.1’s source codes in such a cruel way. How could it have occurred to him at all to use GCC?

What’s most interesting, it all turned out pretty fine. He wrote a small utility to run preprocessing by GCC 4.8.1 of each file from the folder it was stored in. As it displayed error messages regarding troubles with locating and including header files, we added -I switches into the launch parameters to specify the paths to the required files. A couple of header files which we failed to find were created empty. All the other troubles with #include expanding were related to including resources, so we commented them out. The WIN macro was defined for preprocessing as the code contained branches both for WIN and MAC.

After that, PVS-Studio Standalone and I came into play. I noted down a few suspicious code fragments I want to show you. But let’s first speak a bit more about the project itself.

A few words about Word 1.1a’s code

The most complex functions

The following functions showed the highest cyclomatic complexity:

CursUpDown – 219;

FIdle – 192;

CmdDrCurs1 – 142.

#ifdef WIN23

While looking through the source code, I came across “#ifdef WIN23” and couldn’t help smiling, so I noted that fragment down. I thought it was a typo and the correct code was #ifdef WIN32.

When I saw WIN23 for the second time, I grew somewhat doubtful. Just then it struck me that I was viewing source files as old as 24 years by the moment. WIN23 stood for Windows 2.3.

Stern times

In some code fragment, I stumbled upon the following interesting line.

Assert((1 > 0) == 1);

It seems incredible that this condition can ever be false. But since there is such a check, there must be the reason for it. There was no language standard at that time. As far as I get it, it was a good style to check that the compiler’s work met programmers’ expectations.

Well, if we agree to treat K&R as a standard, the ((1 > 0) == 1) condition is always true, of course. But K&R was just a de facto standard. So it’s just a check of the compiler’s adequacy.

Analysis results

Now let’s discuss the suspicious fragments I have found in the code. I guess it’s the main reason why you are reading this article. So here we go.

It turned out that the first line for some reason contains the text Fib.rgwSpare0[5]. That’s incorrect: there are just 5 items in the array, therefore the largest index should be 4. The value ‘5’ is just a typo. A zero index should have most likely been used in the first string:

The printf() function is a variadic function. Passing or not passing arguments to it are both legal. In this case, the programmer forgot about the arguments, and it resulted in printing garbage all the time.

Uninitialized pointers

One of the auxiliary utilities included into the package of Word source files contains a very strange piece of code.

The ‘pfl’ variable is initialized neither before the loop nor inside it, while the fclose(pfl) function is called multiple times. It all, however, may have worked pretty well. The function would return an error status and the program would go on running.

And here’s another dangerous function which will most likely cause a program crash.

When working with the ‘qps’ variable, the following values are written into ‘pcab->iCharIS’: 2, 1, 0.

The ‘hps’ variable is handled in a similar way, but in this case, some suspicious values are saved into the variable ‘pcab->iCharPos’: 2, 1, 1.

It must be a typo: a zero was most likely meant to be used at the very end.

Conclusion

I have found very few strange fragments. There are two reasons for that. Firstly, I found the code to be skillfully and clearly written. Secondly, the analysis had to be incomplete, while teaching the analyzer the specifics of the old C language wouldn’t be of any use.