Building extensions for Python 3.5

Some parts of this post have been superseded. If you are interested in the background, continue reading (I have marked the parts that are now incorrect). If you simply want to see how Python 3.5 and later are avoiding CRT compatibility issues, you can jump straight to part two.

As you read through the changelog or What’s New for Python 3.5, you may notice the following item:

Windows builds now use Microsoft Visual C++ 14.0, and extension modules should use the same.

For most Python users, this will not (and should not) mean anything. It doesn’t affect how you use or interact with Python, it isn’t new syntax or a new library, and generally you won’t notice any difference other than a small performance improvement (yay!).

However, Python extenders and embedders care a lot about this change, because it directly affects how they build their code. Since you are almost certainly using their work (numpy, anyone? Blender? MATLAB?) you should really hope that they care. However, this is a change, and no matter how good the end result is, change takes time. Please be patient with project maintainers, as they will have to spend more time supporting Python 3.5 than previous versions.

So while the most obvious benefit for most people may be a performance improvement, we haven’t even bothered benchmarking this precisely. Why not? Because the long-term benefits of the change are so good it would be worth sacrificing performance to get them. And we know it’s going to hurt some project maintainers in the short term, and again, the long-term benefits for the entire ecosystem – and those same maintainers – are worth it.

As I’m largely responsible for the compiler change (with the full support of the CPython developers, of course), this post is my attempt to help our ecosystem catch up and set the context so everyone can see just how the benefits are worth the pain. Python 3.5.0rc2 is already available with all of these changes, so project maintainers can be testing their builds now.

First, some definitions

While this post is intended for advanced audiences who probably know all of these terms, I’ll set out some definitions along the way just to make sure we’re all talking about the same thing.

MSVC is Microsoft Visual C++, the compiler used to build CPython on Windows. It’s often specified with a version number, and in this post I’ll refer to MSVC 9.0, MSVC 10.0 and MSVC 14.0.

CRT refers to the C Runtime library, which for MSVC is provided and supported by Microsoft and contains all of the standard functions your C programs can call. This is a heavily overloaded term, so I’ll be more specific and refer to DLL names (like msvcr90.dll) or import library names (like libucrt.lib) where it matters.

MSVCRT refers specifically to the CRT required by MSVC. Other compilers like gcc have their own CRT, typically known as libc, and even MSVCRT is made up of parts with their own distinct names.

MSVCRT’s Little Problem

The problem is rooted in a design decision made many years ago as part of MSVC (I don’t even know which version – probably the very earliest). While we can view it differently today, at the time it was clearly a good design. However, the long-term ramifications were not obvious without the rise of the internet.

Each version of MSVC (the compiler) comes with a matched version of the CRT (the library), and the compiler has intimate knowledge of the library. This allows for some cool optimizations, like choosing different implementations of (say) memcpy automatically based on what the compiler knows about the variables involved – if it can prove the ranges never overlap, why bother checking for overlap at runtime?

However, it does mean that when you use a different compiler, you also have to use the matched CRT version or everything breaks down very quickly. Generally this is okay, since when a developer upgrades to a newer compiler they can rebuild all of their code. The reason the internet causes this to break down is the rise of plugins and the ease of updates.

Many applications support plugins that are loadable shared libraries (DLLs, often with a special extension such as .pyd). While the application may not consider or describe these as plugins – Python prefers “extensions” or “native modules” – it is still a plugin architecture. And with the internet, we have easier access than ever to download and install many such plugins, and also to update the host application.

The CRT comes into play because it is shared between the host application and every plugin. Or rather, it assumes that it is shared. Because of the way Windows loads DLLs, if the host application and all its plugins are built with the same MSVC version and hence use the same CRT version, the state kept within that CRT would be shared.

Shared state includes things such as file descriptors, standard input/output buffering, locale, memory allocators and more. These features can be used equally by the host and its plugins without conflicts resulting in data corruption or crashes.

However, when a plugin is built with a different CRT, this state is no longer shared. File descriptors opened by the plugin do not exist (or worse, refer to a different file) in the host, file and console buffering gets confused, error handling is no longer synchronised, memory allocated in one cannot be freed in the other and so on. It is possible to safely use a plugin built with a different CRT, but it takes care. A lot of care.

This is the situation that Python 2.7 currently suffers from, and will continue to suffer from until it is completely retired. Python 2.7 is built with MSVC 9.0, and because of compatibility requirements, will always be built with MSVC 9.0 – otherwise a minor upgrade would break all of your extensions simultaneously, including the ones that nobody is able to build anymore.

Unfortunately, MSVC 9.0 is no longer supported by Microsoft and all the free downloads were removed, making it essentially impossible to build extensions for Python 2.7. The easiest mitigation was to keep making the compilers available in an unsupported manner, so we did that, but it still leaves projects in a place where they are using old tools, likely with unpatched bugs and vulnerabilities. Not ideal.

Python 3.3 and 3.4 were built with MSVC 10.0, which is in essentially the same position. The compiler is no longer supported and the tools are no longer easily available. Building extensions with later versions of MSVC results in CRT conflicts, and building with the older tools misses out on security fixes and other improvements.

One example of an improvement in MSVC 14 that is not in MSVC 10 or earlier is support for the C99 standard. I’m not claiming it’s 100% supported (it’s not), but even 90% support is much more useful than what was previously available.

The best mitigation we have for MSVC 10.0 builds of Python is to migrate to Python 3.5. Luckily, doing so does not require the same porting effort as moving from Python 2.7 would require, but it raises the question: why is Python 3.5 any better?

The answer is: UCRT.

The UCRT Solution

As part of Visual Studio 2015, MSVCRT was significantly refactored. Rather than being a single msvcr140.dll file, as would be expected based on previous versions, it is now separated into a few separate DLLs.

The most exciting one of these is ucrtbase.dll. Look carefully – there is no version number in the filename! This DLL contains the bulk of the C Runtime and is not tied to a particular compiler version, so plugins that reference ucrtbase.dll will share all the state we discussed above, even if they were built with different compilers.

Another great benefit is that ucrtbase.dll is an operating system component, installed by default on Windows 10 and coming as a recommended update for earlier versions of Windows. This means that soon every Windows PC will include the CRT and we will not need to distribute it ourselves (though the Python 3.5 installer will install the update if necessary).

It’s very important to clarify here that the compatibility guarantees only hold when linked through ucrt.lib. The public exports of ucrtbase.dll may change at any time, but linking through ucrt.lib uses API Sets to provide cross-version compatibility. Using the exports of ucrtbase.dll directly is not supported.

So the major issue faced by earlier versions of Python no longer exist. The next version of MSVC will be able to build extensions for Python 3.5, and it may even be possible for later version of Python 3.5 to be built with newer compilers without affecting users. But while this is the start of the story, it isn’t the end and the rest is not so pretty.

The UCRT Problems

While ucrt.lib is a great improvement over earlier versions, if you followed the link above or just read my comment carefully, you’ll see the rest of the problem. Besides ucrtbase.dll, there are other libraries we need to link with.

For pure C applications, the other DLL we need is vcruntime140.dll. Notice how this one includes a version number? Yeah, it depends on the version of the compiler that was used. Applications using C++ will likely depend on msvcp140.dll, which is also versioned. We have not yet completely escaped DLL hell.

Why weren’t these libraries also made version independent? Unfortunately, there are places where the compiler still needs intimate knowledge of the CRT. They are very few, and vcruntime140.dll in particular exports almost no functions that are both documented and have no preferred alternative in ucrtbase.dll (for example, memcpy may be used from vcruntime140.dll, but memcpy_s from ucrtbase.dll should be preferred). However, much of the critical startup code is part of vcruntime140.dll, and this is so closely tied to what the compiler generates that it cannot reasonably be made compatible across versions.

Ultimately, depending on any version-specific DLL takes us right back to the earlier issues. Extensions for Python 3.5 will need to use MSVC 14.0 or else include the version-specific DLLs – Python 3.5 could include vcruntime140.dll, but if an extension depends on vcruntime150.dll then it is not easily distributable.

Luckily, this concern was raised as the UCRT was being developed, and so there is a semi-official solution for this that happens to work well for Python’s needs.

The End (of part one)

Remember how I said at the start that some of this blog is no longer valid for Python 3.5? Yeah, that’s from here to the end. To see what we’ve actually done, stop reading here and read part two instead.

The Partially-Static Solution

To avoid having a runtime dependency on vcruntime140.dll, it is possible to statically link just that part of the CRT. Effectively, the required functions, which tend to be a very small subset of the complete DLL, are compiled into the final binary. However, the functions from ucrtbase.dll are still loaded from the DLL on the user’s machine, so many of the issues associated with static-linking are avoided.

There are many downsides to static linking, especially of the core runtime, ranging from larger binaries through to not automatically receiving security updates from the operating system. Previously, applications including Python have avoided static linking by distributing the CRT as part of the application (“app local”), but while this avoids some of the bloat concerns, the application distributor is still responsible for providing updates to the CRT. Statically linking vcruntime140.dll also leaves responsibility with the distributor for some updates, but significantly fewer.

Warning: This is where things get technical. Skip to the next section if you just want to know what you’ll need to fix.

The difference between dynamic linking and static linking is based on a few options passed to both the compiler (cl.exe) and the linker (link.exe). Most people are familiar with the compiler option, one of /MD (dynamic link), /MDd (debug dynamic link), /MT (static link) and /MTd (debug static link). As well as automatically filling out the remaining settings, these also control some code generation at compile time – different code needs to be compiled for static linking versus dynamic linking, and this is how that option is selected at compile time.

For the linker, there are separate libraries to link with. If the compiler option is provided, these are selected automatically, but can be overridden with the /nodefaultlib option. This table is adapted from the VC Blog post I linked above:

I will ignore the debug options for the rest of this post, as debug builds should generally not be redistributed and can therefore reliably assume all the DLLs they need are available. This is why the Python 3.5 debug binaries option requires Visual Studio 2015 – to make sure you have the debug DLLs.

For a fully dynamic release build, we’ve built with /MD. This enables codepaths in the CRT header files that decorate CRT functions with declspec(dllimport) and so code is generated for calls to go through an import stub. Linking in vcruntime.lib and ucrt.lib provides the stubs that will be corrected at load time to refer to the actual DLLs.

For a fully static build, we use /MT which omits the declspec‘s and generates normal extern definitions. Linking with libvcruntime.lib and libucrt.lib provides the actual function implementation and the linker resolves the symbols normally, just as if you were calling your own function in a separate .c file.

What we want to achieve is linking with libvcruntime.lib for the static definitions, but ucrt.lib for the import stubs. Unfortunately, the compiler does not know how to generate code for this case, so it will either assume import stubs for all functions, or none of them, which results in linker errors later on.

There is one case that works: if we compile with /MT so the CRT will be statically linked, the generated code assumes everything can be resolved through it’s regular name. When linking, if we then substitute ucrt.lib instead of libucrt.lib, the linker can generate the import stubs needed to call into the DLL.

We use /MT to select the static CRT. The /GL and /LTCG options enable link-time code generation, and the /NODEFAULTLIB:libucrt.lib ucrt.lib arguments ignore the static library and replace it with the import library. The linker then generates the code needed for this to work, and we end up with a DLL or an executable that only depends on ucrtbase.dll (via the API sets).

Unfortunately, there are some follow-on effects because of this change.

What else does this break?

In case you skipped the last warning, the rest of this post is now invalid. To see what we’ve actually done, stop reading here and read part two instead.

With Python 3.5, distutils has been updated to build extensions in a portable manner by default. Most simple extensions will build fine, and your wheels can be distributed to and will work on any machine with Python 3.5 installed. However, in some cases, your extension may fail to build, may produce a significantly different .pyd file from previously, or may need extra dependencies when distributed.

Static Libraries

The first likely problem is linking static libraries. Because of the compiler change, you will probably need to rebuild other static libraries anyway, and it is important that when you do you select the static CRT option (/MT). As discussed above, we don’t actually link the entire CRT statically, but if your library expects to dynamically load the CRT DLL then it will fail to link.

If your library requires C++, your resulting .pydwill statically link any parts of the C++ runtime, and so it may be significantly larger than the same extension for Python 3.4. This is unfortunate, but not a critical issue, and it actually has the benefit that your extension will not be interfered with by other extensions that also use C++.

Of course, in some cases you really do not want to do this. In that case, I would strongly discourage you from uploading your wheels to PyPI, since you will also need to get your users to install the VCRedist version that matches the compiler you used. Currently, there is no way to check or enforce this through tools like pip.

Since it is so strongly discouraged, I’m not even going to show you how to do it, though I’ll give basic directions. In your setup.py file, you’ll want to monkeypatch distutils._msvccompiler.MSVCCompiler.initialize() (yes, the underscore in _msvccompiler means this is not supported and we may break it at any point), call the original implementation and then replace the '/MT' element in self.compile_options with '/MD'.

Ugly? Yep. By going down this path, you are making it near impossible for non-administrative users to use your extension. Please be very certain your users will be okay with this.

Dynamic Libraries

If you have binary dependencies that you can’t recompile but have to include, then your best option is to include the redistributable DLLs alongside them. Test thoroughly for CRT incompatibilities, especially if your dependencies use a different version of MSVCRT, and generally assume that only ucrtbase.dll will be available on your user’s machines. Dependency Walker is an amazing tool for checking binary dependencies.

Incompatible Code

The third likely issue that will be faced is code that no longer compiles. There has been an entire deprecation cycle between MSVC 10.0 and MSVC 14.0, which means some functions may simply disappear without warning (because the warning was in MSVC 11.0 and MSVC 12.0, which Python never used). There have also been changes to unsupported names and a number of non-standard names are now indicated correctly with a leading underscore.

Also, as with every release, the graph of header files may have changed, and so names that were implicitly #included previously may now require the correct header file to be specified. (This is not necessarily names moving into different header files, rather, one header file may have included another and the name was available that way. Dependencies within header files are not guaranteed stable – you should always include all headers directly when you require their definitions.)

A lot of code tries to fill gaps in various compilers and runtimes by defining functions under #ifdef directives. With the range of changes that have occurred, most of these should be checked and updated – _MSC_VER is defined as at least 1900 now, and because of the switch to /MT some defines of CRT exports may need to have the declspec(dllimport) removed (or remove the entire declaration and use the official headers).

MinGW

Finally, extensions that are built with gcc under MinGW are likely to have compatibility issues for some time yet, since the UCRT is not a supported target for those tools. Again, this pain is unfortunate, but long term it should be entirely feasible for the MinGW toolchain to support the Universal CRT better than the MSVC 10.0 CRT.

In Summary

If you made it this far, technically this part is still mostly correct. But then, it doesn’t really add anything new. To see what we’ve actually done, read part two.

By moving Python 3.5 to use MSVC 14.0 and the Universal CRT, we have (hopefully) removed the restriction to build extensions with matched compilers.

Extensions built with distutils can be distributed easily, though they may be larger or have more build errors as a result of different build settings.

Long term, we believe this change will avoid the problems currently faced by those building for Python 2.7 as toolchains are deprecated and retired.

The short-term pain we are going to experience would have occurred for any compiler change, but after this we should be largely insulated against the next.

Finally, please show some respect and grace towards the maintainers of projects you depend upon. It may take some time to see fully compatible releases for Python 3.5, and shouting at or abusing people online is not necessary or even helpful.

I personally want to thank everyone who distributes builds of their packages for Windows, which they don’t strictly need to do, and I apologise for the pain of transition. This change is meant to help you all, not to hurt you, though the benefits won’t be seen for some time. Thank you for your work, and for making the Python ecosystem exist for millions of users.

Post navigation

19 thoughts on “Building extensions for Python 3.5”

Thankyou for your incredibly detailed and informative post. You touch upon MinGW, I’m hoping that you can clarify: why isn’t the whole Python on Windows toolchain built with it instead of being reliant on MS tools and runtimes?

We use MSVC because it’s the native compiler for the platform – Windows is also built with MSVC. MinGW also provides their own Python builds, and a MinGW install can easily get the toolset needed to build from source.

One thing that I’m left unsure about. When you divide the CRT into the universal and versioned parts, this is effectively saying that future versions of the compiler and CRT will continue to use the same universal runtime. That sounds great, except that it puts serious restrictions on what can be improved on. How will this work in the future? Will there be newer versions of ucrtbase that are backward-compatible with the current CRT and compiler, but include more features? And if so, how do you cope with a system that has a too-old ucrtbase (since it’ll be an OS component here)?

If this is something that hasn’t even been decided yet, feel free to say “Shut up, Rosuav”, but eventually, I’d like to know how this will work out. 🙂

I mentioned (and linked to the horrifically technical details of) API sets, which are a recent feature in Windows (Win7 I think?).

Basically, nobody really depends directly upon ucrtbase.dll – they actually depend on something like “api-ms-win-crt-runtime-l1-1-0.dll”, which the loader in the kernel knows how to resolve. Initially, these are all direct forwarders, but as APIs change in ucrtbase.dll, they can be converted to wrapper functions that hide the change from older code. Applications that want the changes will have to link against *-l2-1-0.dll or *-l1-2-0.dll (I’m not sure how the change is determined, but I’ve seen both).

On a fully updated system, everyone should have the latest ucrtbase.dll (with whatever API changes occur) and the full set of API schemas so that new and old programs will continue to run. (And we can force an update on install if necessary, as Python 3.5 does today.) In theory it’s safer to avoid depending on the newer APIs, but I suspect in practice it won’t be an issue except on old versions (e.g. Python 3.7 and Win7 may require an extra update, while Win10 is going to get the update and be fine).

Thanks for the writeup, this will be very helpful as I expand PyCA’s build matrices in the near future. Does VC 14 supply x64 compilers with the free version or is that still a feature they believe should be paid for?

Yes, they’re included, and were in the last few free versions too. The only downside right now is the free version is still a big chunky 4-8GB install 🙁

I wasn’t involved in the decision to leave them out initially, but I’d guess it was a combination of reducing download/install size and a feature that (in the late 2000’s) was quite specialized. 64-bit Windows wasn’t mainstream until Windows 7, so very few people desperately needed the 64-bit compilers (primarily driver writers). Also, IA64 still looked like it had potential at the time, so there wasn’t even a decided 64-bit architecture.

Obviously since then things have changed, but that was easy to miss with Python being stuck on the old tools.

Compiling against the static runtime then linking against the DLL runtime looks a little dangerous to me. Have you asked the MSVC runtime team what they think of this approach? Microsoft tend to have a fairly restrictive idea of what is a supported configuration and are less likely to include this combination in their testing.

A I understand correctly: one option for mingw-w64 addressing the universal runtime instead of msvcrt.dll is to generate an import library referencing all the api-ms-win-crt-**-l1-1-0.dll’s instead of ucrtbase.dll. Are there any legal concerns using ucrt.lib to find symbol names needed?

That certainly sounds like the right technical solution. I have no idea about the legal situation though. (Ideally, I’d love to be able to let mingw-w64 redist ucrt.lib and link directly, but I don’t think that’s legally feasible – it definitely scares people whenever I suggest it 🙂 )

I have a question regarding the use of the Windows SDKs as a source for the required VC compilers. I have studied the code in distutils/_msvccompiler.py and discovered that DISTUTILS_USE_SDK actually means that distutils assumes that the environment is already set up. I find that kind of counterintuitive. The name suggests to me that the SDK compiler will be used instead of the VisualStudio one, i.e. instead of using VisualStudioXY/varsall.bat to set up the environment and find the executables, SDK_XY/setenv.bat will be used to do so. I realize, that the name and behavior is a legacy one, but perhaps it should be changed. On that note, is there any reason why distutils, after trying to find and use varsall.bat, does not also try to find setenv.bat as a fall-back option? Especially with earlier versions of VS it is probably easier to obtain and install the matching SDK. Starting with VS2015 and its community edition, that isn’t as big a problem, but I had always trouble using the Express versions of the earlier VSs. Anyways, my main point here is to inquire about using the SDK and setenv.bat as a fall-back to varsall.bat.

_msvccompiler.py assumes you will only be using VS 2015 or later, as no earlier versions of the compiler are supported, and the Express versions of VS have been fine since VS 2012. Currently there is no way to get the C++ compiler without getting Visual Studio (the SDK does not include it anymore, though I’m trying to get a standalone compiler download), so other ways of finding the compiler don’t really apply.

setenv.cmd is not part of VS 2015, but vsvars32.bat looks like an equivalent. However, it is part of Visual Studio and not VC, so it may be found even if the compilers are not installed – vcvarsall.bat is guaranteed to be there when the compilers are there (and the issues from 5+ years ago were resolved in VS 2012).

DISTUTILS_USE_SDK is the variable that’s been used for a long time (I can see it in Python 2.5’s distutils.msvccompiler), and given no compelling reason to change the name, I chose to keep it the same. This way, all the existing documentation on the internet continues to apply.

I think that my first post ended up somewhat rambling. I mixed up several points I wanted to address.

My comments where not exclusively aimed at the new _msvccompiler.py, which, as you pointed out, applies only to VS2015 or Python 3.5 respectively. As 3.5 is rather new, my immediate concern is for Python 3.4 and VS2010. I simply contacted you, because as author of _msvccompiler.py, I assume you have some knowledge about the entire thing, and perhaps some of the underling design decisions that have been made over the years. Whenever I end up having to compile some python extensions under Windows I end up having trouble getting everything setup correctly. Sometimes I have the impression, that Windows is treated like a redheaded stepchild by the Python devs. I was just happy that I found someone who worked on that topic recently (as you are certainly aware most of the distutils code concerning Windows is ancient).

I was not aware, that VC is not included with the newest SDK. I know it is part of SDK 7.1, which is what I’m tiring to use ATM, sorry for the confusion. Perhaps I’m naive, but I’m somewhat averse to installing more than one version of any software, including VS. Thus my desire to install only VC (2010) via the SDK instead of the entire VS2010.

From there follows my question as to why the setenv.bat of the SDK is not treated equivalent to the varsall.bat of the VS installation, as they (to my eyes at least) perform exactly the same function. The use of the SDK VC is simply not as well integrated as the VC of a VS installation.

I personally find the name DISTUTILS_USE_SDK (and all the accompanying guides) particularly misleading , because it suggest to me that its use directs distutils to utilize the SDK to set up the environment. Instead it assumes that the environment is already set up! (I admit, that has nothing to do with you, but I’m annoyed whenever I think about it, sorry). While that may be of little concern when building a package, It is highly irritating when using some package that performs on the fly compilation (like Smypy or Numba or PyDy), as it requires that the entire setup is to be done before each session and cannot be done by the distutils config.