Rejuvenating the Microsoft C/C++ Compiler

Our compiler is old. There are comments in the source from 1982, which was when Microsoft was just starting its own C compiler project. The comments of that person (Ralph Ryan) led me to a paper he published in 1985 called “The C Programming Language and a C Compiler”. It is an interesting read and some of what he describes is still reflected in the code today. He mentions that you can compile C programs with two floppy drives and 192K of RAM (although he recommends a hard drive and 256K of RAM). Being able to run in that environment meant that you couldn’t keep a lot of work in memory at a time. The compiler was designed to scan programs and convert statements and expressions to IL (intermediate language) as quickly as possible and write them to disk without ever having an entire function in memory at one time. In fact, the compiler will start emitting IL for an expression before even seeing the end of the expression. This meant you could compile programs that were quite large on a pretty small machine.

Note: Our compiler consists of two pieces (a front-end and a back-end). The front-end reads in source code, lexes, parses, does semantic analysis and emits the IL. The back-end reads the IL and performs code generation and optimizations. The use of the term “compiler” in the rest of this post pertains only to the front-end.

For C code (especially K&R C), this approach worked well. Remember, you didn’t even need to have prototypes for functions. Microsoft added support for C++ in C 6.07.0, which was released in 19891992. It shared much of the same code as the C compiler and that is still true today. Although the compiler has two different binaries (c1.dll and c1xx.dll) for C and C++, there is a lot of source code that is shared between them.

At first, the old design of the compiler worked OK for C++. However, once templates arrived, a new approach was needed. The method chosen to implement this was to do some minimal parsing of a template and then capture the whole template as a string of tokens (this is very similar to how macros are handled in the compiler). Later, when a template is instantiated, that token stream would be replayed through the parser and template arguments would be replaced. This approach is the fundamental reason why our compiler has never implemented two phase lookup.

The design of our compiler also made it unsuitable for other purposes where you wanted to retain more information about a program. When we added support for static analysis (/analyze) in the compiler, it was added to the same code base as the actual compiler, but the code was under #if blocks and we generated separate binaries (c1ast.dll and c1xxast.dll). Over time, this resulted in more than 6,000 #if preprocessor blocks.

The static analysis tools built an AST for an entire function by capturing pieces as the regular compiler does its parsing. However, this captured AST is fundamentally different from what the real compiler uses for its data structures, which often lead to inconsistencies. Also, as new language features were added, most had to be implemented twice: once for the compiler and again for static analysis.

About three years ago we embarked on a project to finally perform a major overhaul of our compiler codebase. We wanted to fix problems we have had for a long time and we knew new features such as constexpr were going to need a different approach. The goal was to fundamentally change the way our compiler parses and analyzes code.

We quickly decided on a few key tenets to guide our development. The most important tenet is that all rejuvenation work that we do will be done in the same development branch as features. We don’t want to “go dark” and have two divergent codebases that are difficult to reintegrate. We also want to see value quickly, and in fact, we need value quickly.

The first phase of this work has finally shipped in Visual Studio 2015. We have changed a lot of the guts in the compiler’s internal implementation, although not much is directly visible. The most visible change is that c1ast.dll and c1xxast.dll are no longer present. We now handle all compilation for static analysis using the same binary as the one we do for code generation. All 6,000+ #if blocks are gone and we have less than 200 runtime checks for analysis. This large change is why code analysis was disabled in some of the RC builds of the C++ compiler as we ripped out the #if blocks and then had to build the new infrastructure in its place.

The result of this is that we now generate a full tree for functions and can use that same data structure to generate code or to perform static analysis. These same trees are used to evaluate constexpr functions as well, which is a feature we just shipped. We also now track full source position information (including column) for all constructs. We aren’t currently using column information but we want to be able to provide better diagnostics in the future.

As we make these changes, we strive to provide as much backward compatibility as we can while fixing real bugs and implementing new features in our compiler. We have an automated system called Gauntlet that consists of over 50 machines that builds all versions of the compiler and runs many tests across all flavors of 32bit, 64bit, and ARM architectures including cross compilers. All changes must pass Gauntlet before being checked in. We also regularly run a larger set of tests and use our compiler on “real world code” to build Visual Studio, Office, Windows, Chrome, and other applications. This work flushes out additional compatibility issues quickly.

Looking forward, we are continuing to invest in improving our compiler. We have started work on parsing templates into an AST (abstract syntax tree) and this will yield some immediate improvements in our support for expression SFINAE and our parsing of “qualified names”. We will continue to invest in improving our compiler with a goal towards making it fully standards compliant. That said, we are also very interested in improving our support for Clang as well. In fact, there is a presentation at CppCon on using the Clang front-end with our code generator and optimizer. Here is the link to that session. http://sched.co/3vc4

Tags

Join the conversation

That's very nice and really reassuring that compiler finally will be close to conformance with the other major ones. However reading about tests made me wonder how then several warnings became broken in vc2015 even though code for testing them is right in msdn :)

But anyway thank you for the hard work, I'm waiting eagerly for much required update 1 )

The 2 major overhauls I'd like to see in the compiler are both concerned with REALLY reducing build times.

**1**. Most important, for professionals, a way to compile a bunch of translation units under the opt-in or opt-out non-standard assumption that any given header will produce the same code in every inclusion. In effect a kind of temporary pre-compilation of each header used in this bunch of translation units. Usually when I introduce this notion (hey, we can have a really fast compilation just by letting the compiler assume what one typically assumes anyway, so that each header is just included once!) people argue that that's brain-dead because "modules" will fix all this. Sometime, real soon now. But, please, the new model would be really nice & save people much time.

**2**. Not so important, just also Very Nice To Have: for the hobbyist/student, a ditto way to compile without the compiler collecting any recovery information, with the compiler just stopping immediately on first error. That way one not only gets a potentially blazingly fast compilation (like, uh, Borland's Turbo products in the 90's), but one may also avoid or at least reduce the really long & ungrokable error message avalanches.

Cheers, & keep up the good work! :),

– Alf

Note: for (1) one may/should make an exception for the "assert.h" header (and "cassert"), because it's explicitly allowed to produce different code, one of two possibilities, on each inclusion.

There's a good technique for it, which is poisoning. If during compilation you encounter something that's not 100% OK you don't emit a best-guess, you emit a poison token. Anything using this poison token is itself discarded & replaced by another poison token (so it does quell nearly all warnings/errors after it). This causes the compiler to emit anything unrelated to the poison token still, but makes it ignore errors about things that are unrelated.

I've implemented this in a GLSL frontend and for that purpose (without #include stuff) it works excellently.

Are there any plans to provide programmatic access to the AST for code analysis and generation like in Clang? It would be really cool if we could autogenerate code, like metadata, during compilation. It would be even better if Intellisense would pick up on it, but that's borderline psychic behavior.

I _knew_ you treated templates like macros! I remember having trouble with expression arguments to templates, I had to parenthesize them like they were macro arguments

Something I've always been curious about: I remember reading somewhere that the analyzer was written in Prolog, is it true? (if not the analyzer, one of its cousins, like PREfast – or are they the same code?)

Please please PLEASE make an open API for AST generation, the source code doesn't even need to be open, but the API does. I'd much rather use MSVC for this than clang for refactoring tools, because right now I'm using clang on MSVC targets.

+1 to Andrew Artz's comment about the AST. I assume you guys are aware to some degree of the gcc / clang schism, largely driven by RSM's aversion to providing an API for the AST due to license / religious reasons. It would be _great_ to have another entry in that battle.

I definitely hear all of you regarding a public API and thank you for voicing this. I couldn't agree more. The API we currently use for analysis is a COM API although it is not documented and is only used internally. While it serves its purpose, I would prefer to expose a C++ library for public consumption rather than our current COM API.

As to stopping on first compile error, that seems like a reasonable option to have.

@Junker – Our compiler is compiled as C++, although there is a lot of old C-style code still there too. We update the toolset frequently and we can use any new C++ feature as soon as it is in the toolset we build with. We do use templates pretty extensively.

@KJK – I don't believe any of the analyzers are written in Prolog.

@Jeison – We support designated initializers in C as of VS2013.

@Alf – There is work underway in the committee to define a module system for C++ that has the potential to do what you are asking.

Correction – I got an email from one of the early compiler developers who corrected me that C++ didn't show up until C7.0 in 1992.

You've done a remarkably good job of overhauling the C/C++ compiler. I expect to find and report several bugs in any compiler upgrade, because I have a large codebase that I can test pretty thoroughly. I only found 3 in the VS.2013 to VS.2015 upgrade (an omission from the new 32-bit run-time, a code generation bug that was apparently long-standing, but which I was able to make a usable report on this time round, and a command-line argument parsing bug, which always looked like the consequences of a code overhaul, but was fixed in the next CTP). By contrast, VS.2012 to VS.2013 showed up 12 bugs, four times as many.

My biggest wish for Visual Studio is proper C support. Yes, I am much happier with what's there, but it still isn't C99 compliant. It is "C with C99 features". A muddy dialect that doesn't even update __STDC_VERSION__.

Hopefully the refactoring effort brings you the agility you need to support specific language dialects and versions, similar to the gcc –std flag.

IMO the vc2015 is better than 2013 even except for those implemented C++11/14 specifications. But i also think that there still needs some time to wait people to accept it because the 2012(2013 may also be included, depends on which one specific user cares about more between better code-generator and better syntax parser) busted things and caused bad consequences.

Great update. It would greatly help if VC++ had complete support for C++11. I've had to use gcc for my latest project. Also can VC++ be distributed alone without VB, SQL and a dozen or so development environments that I never use but fill my had disk. These are the two issues that hold me back from using VC++ even though IMO VS is the best C++ IDE out there bar none.

I support the idea of having public [Clang/LLVM/Roslyn/v8]-like API to interact with compiler internals and code generation.

I also support the idea of full C99 and C11 support, not only the parts which C++ requires, but compile ISO standard (ok <tgmath> can be an exception).

-> Compiler as an Isolated Service – CAAIS :)

When I read the title of this blog, first thing that came to my mind was: visualstudio.uservoice.com/…/6742251-make-c-c-compiler-cl-exe-independent-of-ide. Please consider making VC compiler as a separate entity decoupled from Visual Studio. I think there is no open-source community which support MSVC compilation and is not willing to accept it as a number 1 feature. Besides, with Windows Server 2016 brining docker support and Windows Nano Server arrival, "VCL as a service" architecture is highly imperative for build scenarios then ever before.

Once again, we all admire VC group's efforts made in last couple of years especially in VS2015 and we look forward for more C and C++ standard compliance (as two separate languages with some overlapping) and "compiler as a service" with public API and its isolated presence decoupled from Visual Studio.

@Klemen There's some way to go before Clang can be a drop-in replacement for VC++, because of legacy issues in the Windows headers. Obviously, this could go faster if Microsoft were contributing, but it's not a small job.

I'm a developer from the D-lang community, and one of the holy grails for languages like D and Rust would be to use the MS codegen (implying debuginfo, and binary compatibility?) coupled with these languages front-ends.

If this is something that is possible for the community to develop, can anyone link me to relevant resources, so I can get started immediately? Or does it require internal access to the MSC codebase?

Look, your code base is probably unmaintainable at this point; say that no new features will be added to it, deprecate it over 5-7 years, and switch over to llvm permanently. I am sure that no one at Microsoft actually enjoys working on the compiler anymore. There's no sin in killing old code.

@Just switch over to llvm, my thoughts exactly. And cross-platform developers would appreciate this as well. It's really a pain to write cross-platform code that's C++11 (or C++14) compliant and then write it again without the features Visual C++ doesn't support.

Besides, Clang is also C11 compliant, so Windows developers get a C11 compliant compiler "for free".

By switching to Clang you can still sell the Visual Studio IDE, so there's no loss of profit.

This is all Chinese to me, however I hope that the C++ syntax will not be affected and that the warnings and errors will continue to appear if the programmer makes mistakes, something really necessary while struggling to make our 30 or 40 lines of code workable and functional. VS Community C++ really works fine.

The compiler may be old internally but it doesn't really feel old to me, externally. I am glad that the internals of the parts we can't see are being improved but I am frustrated that the parts we do see (resource editor, APIs) are not being improved or upgraded.

So, finally somebody starts thinking about solving an almost 30 year old problem we have fought for almost since ever!

The C core was already stone-age old when Windows 95 appeared and Microsoft not only obstacled suggestions and people willing to fix the code for free, but they also continued to worsen it over time and compiled it into pretty all Windows and application versions I am aware of.

These day we face however a general change towards more free development tools and due to partially open source, much, much better software. And for this reason I stop shouting here and watch it coming! :-)

It's not until recently, when I had to do some cross-platform C++ development (Linux/Windows) that I noticed a number of standard C++ features that aren't supported by the MS compiler. I'm glad MS is coming up to speed and supporting those features!

Well, as both Xcode and Visual C++ user I can say that I'm super-impressed by the VC++ compiler. The quality of the generated code and optimizations just runs loops around clang in Xcode. The code base may be old, but since Visual C++ 2005 (2003?) the generated code is top-notch. And that's what matters to me.

Clang is great, but I for one think it's best to have a completely independent compiler as well. And to those of you who think MS should throw away code just because there's a comment from 1982 in it, you've probably never worked on a large-scale software project. With this blog post, MS shows that it really understands how to continually invest it and improve a source base, instead of falling for the "start over" fallacy. This article is still as valid as ever: http://www.joelonsoftware.com/…/fog0000000069.html

Hmmm… "…we are also very interested in improving our support for Clang as well. In fact, there is a presentation at CppCon on using the Clang front-end with our code generator and optimizer."

What about using your FRONT-end to generate Clang IL, from which we could then generate machine code using Clang for any architecture that Clang supports? I.e. we could then use Visual Studio for any Clang-supported target architecture, including not-yet-released new architectures with a custom Clang back-end written by the guys on the new architecture's development team… that seems more useful to me.

@Memories: "256K of RAM in 1985? That's double what was available on any consumer PC".

I beg to differ. In 1983 I had a Compaq Portable with at least that much RAM (might have been 512K, actually). Bootup was with DOS in the A: drive, and developer tools (Lattice C compiler, SEE editor) in the B: drive; boot sequence would create a RAM drive and copy the dev tools to it, and replace the B: drive with the source code floppy. Lattice C was eventually repackaged by by Microsoft, but I don't think that it formed the basis of MS's own C compiler, which came out later.

No, it wouldn't be beyond idiotic. Clang support is already being added and supporting Clang doesn't mean you have to abandon your entire tool chain. The ability to cross compile to Linux and Mac or even back to XPe would be a very good thing. Moreover, this doesn't mean Microsoft would just accept what LLVM is doing, but would become active contributors to it.

Having a truly flexible IDE in regards to C++, allowing various parts of the tool chain to be replaced or augmented, would be a very good thing.

"Clang support is already being added and supporting Clang doesn't mean you have to abandon your entire tool chain."

That's exactly what is implied by "Wouldn't it have been far more effective and beneficial to simply adopt Clang and expand on it?"

"The ability to cross compile to Linux and Mac or even back to XPe would be a very good thing. Moreover, this doesn't mean Microsoft would just accept what LLVM is doing, but would become active contributors to it."

Typo: "We also (…) use our compiler on “real world code” to build Visual Studio, Office, Windows, CHROME, and other applications" Did you said this forbidden word??? Yo must write "Edge" 1000 times as punishment ;)

One of the most cumbersome thing of the compiler is, when it detects a missing colon.

Instead of trying to compile with an inserted colon and notifying the user, that a missing colon was inserted and giving the ability to remove it, it stops compiling. Very cumbersome, since over 10 years.

+1 for the idea of making the compiler, libraries, Windows SDK etc available as a standalone download. I am involved with a project where we run a CI build server that compiles the code using msbuild and Visual C++. Being able to install just the compiler and bits without needing the entire IDE installed would be very useful.

+1 to the idea of having APIs available as well, especially more hooks into the PDB and debugger stuff.

As for clang, I cant find the quote right now but Microsoft have indicated that their fork of clang (and the stuff needed to talk to the c2.dll backend and generate the correct IL to feed it) isn't going to be open.

And no, Microsoft shouldn't adopt clang as a front end or LLVM as a backend, if nothing else clang/LLVM doesn't support the MS C++ ABI and people (like me) need that support…

Microsoft will never do this, because it's not in Microsoft's DNA to do this, but both Microsoft and the rest of the world would benefit if Microsoft open sourced its own compiler. A lot of the crazy and unthinkable bugs that have always infected the compiler would stand a chance of going away.

Hi. Are you the one developing for UTF-8 support? Supporting UTF-8 just in the parser is not enough. Using a UTF-8 execution code in current C Runtime Library, which uses ANSI/OEM charsets, causes messy codes. Using MultiByteToWideChar/WideCharToMultiByte and _wfopen is non-portable and re-inventing the wheel. Cygwin works fine with UTF-8 using the method above but it uses a POSIX filesystem, rather than a native Win32 filesystem. A UTF-8 based CRT is needed to solve these problems.

Changing out the MS runtime conventions for the GNU/POSIX conventions is a fatal mistake (see last paragraph below for the horror example).

The best way to pass UTF-8 (or any other source charset other than the build machine’s “ANSI codepage”) through is: 1. In UNICODE char and string literals (wchar_t and the new explicit UTF-16 and UTF-32 etc. types, maybe if necessary a special _ms_char_utf8 type distinguishable by C++ overloading but miscible with char otherwise) map directly to the target charset without going through ANSI during compilation. 2. In OS interface strings (such as #include directives etc.) do the same unless actually running the MS compiler on some Win9x variant. 3. In comments etc. just pass the charset or Unicode equivalent through to the Metadata interfaces (browse database, docgen, typelib gen etc.). 4. Finally, in generic char/string literals, pass through the input bytes if it input charset is char based, so odd bytes in literal strings can hold any value, even if not charset-valid in the formal input charset, this is important when “magick” byte sequences were defined by pasting in using old compilers and tools, such as those that came with CL 6.0.

JUST NEVER EVER Autoconvert source code to different encodings when saving after edits, this wrecks existing code and generates noise in change control systems if you allow Visual Studio to touch the source control provider (a major bug in some older Visual Studio Versions, bad enough to make me still diligently hand check every file change outside Visual Studio before checking it in with an outside tool that won’t let Visual Studio do any last minute damage).

It is bad enough that the VS2015 runtime completely broke printf() of all things due to a horribly designed attempt to switch from the (very sane, very well tested, very widely used) MS support for wchar_t format string handling introduced in or before 1992 (the NT 3.1 SDK Beta), to the (braindead, barely tested, used by almost no one) GNU/POSIX pseudo-wchar_t support that got accepted by the C committee for some historic political reason. Fact is that no significant non-Microsoft operating system made any real use of wchar_t stdio functions, except as a glue-on compliance checkbox that translated everything to the local 8 bit charset in underspecified ways. The C supporting systems known to use the wchar_t stdio functions for anything real were Microsoft NT/Win32/Win61, Microsoft CE/Mobile/Embedded-compact, and Microsoft Nokia Symbian (which used the same fundamentals as Microsoft, but didn’t actually add stdio.h until shortly before coming under full Microsoft influence). There may also have been some stuff on HP Apollo systems running Aegis, though I am not sure.

Debugging of programs on Windows XP was made so difficult because of some trick in vc compiler console program which make it unrunnable on Windows XP. So if you need to change something in program and run again you need to have extra computer and build program on it! Then execute it on XP. Real nightmare!

Please remove new (after XP) Windows kernel functions usage from C++ compiler so it can be executed on old Windowses! Compiler is not IDE as it was said many times here. And we want them to be separated! And if execution of new IDE’s on XP you do not want to make happen, why you did the same thing with the compiler!? This is just console program wich was started so long ago!!!

If this is part of policy to sell more new Windowses (which looks like this because of unreasonable windows version requirements for new cl.exe) … this is so unfair methods! Please stop such a policy and let user decide which Windows version they need (especially if new versions of OS is incompatible with XP … I mean do not allow to work with VERY old special harware devices)

Hi all! I have projects which should be compiled for several computers with different archs (I mean that some of them only support SSE2, some AVX and others AVX2). It would be great if new compiler will be able to generate machine code which contains implementations of functions for all available archs and selects them dynamically. Yes code size will grow but it is not critical.