Dynamic by default

Currently, GHCi doesn't use the system linker to load libraries, but instead uses our own "GHCi linker". Unfortunately, this is a large blob of unpleasant code that we need to maintain, it already contains a number of known bugs, and new problem have a tendency to arise as new versions of OSes are released. We are therefore keen to get rid of it!

Even removing it only on particular OSes, arches, or OS/arch pairs would be useful, as much of the code is used only for a particular platform. However, the best outcome would be to remove it on all platforms, as that would allow us to simplify a lot more code.

Our solution is to switch GHCi from using the "static way", to using the "dynamic way". GHCi will then use the system linker to load the .dll for the library, rather than using the GHCi linker to load the .a.

For this to work, there is technically no need to change anything else: ghc could continue to compile for the static way by default. However, there are 2 problems that arise:

cabal-install would need to install libraries not only for the static way (for use by ghc), but also for the dynamic way (for use by ghci). This would double library installation times and disk usage.

GHCi would no longer be able to load modules compiled with ghc -c. This would violate the principle of least surprise, and would make it harder to work around GHCi's limitations (such as performance, and lack of support for unboxed tuples).

Given these 2 issues, we think that if making GHCi use dynamic libraries, we should also make ghc compile the "dynamic way" by default.

Windows

Currently, we don't know how to do dynamic-by-default on Windows in a satisfactory way. We can build dynamic libraries, but we don't have a way of telling them where to find their DLLs. This means that ghc --make foo; ./foo won't work unless we copy all the library DLLs into the current directory, which isn't very satisfactory.

We are currently working on this.

Performance

There are some performance questions to consider before making a decision.

(We don't have Windows performance numbers as we don't have dynamic-by-default working on Windows yet).

Binary sizes are way down across the board, as we are now dynamically linking to the libraries.

Things are rosiest on OS X x86_64. On this platform, -fPIC is always on, so using dynamic libraries doesn't mean giving up a register for PIC. Overall, performance is a few percent better with dynamic by default.

On OS X x86, the situation is not so nice. On x86 we are very short on registers, and giving up another for PIC means we end up around 15% down on performance.

On Linux x86_64 we have more registers, so the effect of giving one up for PIC isn't so pronounced, but we still lose a few percent performance overall.

For unknown reasons, x86 Linux suffers even worse than x86 OS X, with around a 30% performance penalty.

static -> dynamicon OS X x86_64

static -> dynamicon OS X x86

static -> dynamicon Linux x86_64

static -> dynamicon Linux x86

Binary Sizes

-1 s.d.

-95.8%

-95.8%

-95.8%

-95.9%

+1 s.d.

-93.1%

-92.8%

-92.6%

-92.4%

Average

-94.6%

-94.5%

-94.5%

-94.4%

Run Time

-1 s.d.

-1.2%

+11.7%

-2.5%

+16.6%

+1 s.d.

+1.6%

+20.0%

+9.6%

+40.3%

Average

+0.2%

+15.8%

+3.3%

+27.9%

Elapsed Time

-1 s.d.

-6.9%

+10.3%

-2.5%

+16.6%

+1 s.d.

-0.3%

+20.4%

+9.6%

+40.3%

Average

-3.7%

+15.2%

+3.3%

+27.9%

Mutator Time

-1 s.d.

-1.3%

+8.9%

-5.0%

+18.3%

+1 s.d.

+1.9%

+18.3%

+7.5%

+46.8%

Average

+0.3%

+13.5%

+1.1%

+31.8%

Mutator Elapsed Time

-1 s.d.

-4.5%

+7.7%

-5.0%

+18.3%

+1 s.d.

+0.3%

+18.8%

+7.5%

+46.8%

Average

-2.1%

+13.1%

+1.1%

+31.8%

GC Time

-1 s.d.

-1.4%

+16.3%

+5.6%

+13.4%

+1 s.d.

+1.8%

+27.1%

+11.2%

+24.0%

Average

+0.2%

+21.6%

+8.4%

+18.6%

GC Elapsed Time

-1 s.d.

-1.5%

+15.8%

+5.6%

+13.4%

+1 s.d.

+1.3%

+25.6%

+11.2%

+24.0%

Average

-0.1%

+20.6%

+8.4%

+18.6%

Compile Times

-1 s.d.

-11.7%

+6.2%

-1.8%

+27.0%

+1 s.d.

-0.5%

+18.2%

+7.8%

+37.8%

Average

-6.3%

+12.1%

+2.9%

+32.3%

OS X x86 vs x86_64

Currently, some people use the x86 version of GHC on OS X for performance reasons. It's not clear for how much longer this will be viable, as other OS X libraries start dropping x86 support.

The left-hand column shows the status quo: x86_64 only beats x86 in mutator time, and that is a shallow victory as the higher GC time means that total runtime is worse for x86_64.

The right-hand column shows what the situation would be if we switch to dynamic instead. Allocations, memory use etc remain higher due to all word-sized things being twice as big. However, the combination of x86_64's performance improving, and x86's performance getting worse, means that x86_64 is now faster overall.

x86 -> x86_64when static by default

x86 -> x86_64when dynamic by default

Binary Sizes

-1 s.d.

+38.0%

+7.4%

+1 s.d.

+38.6%

+30.6%

Average

+38.3%

+18.5%

Allocations

-1 s.d.

+63.2%

+63.2%

+1 s.d.

+114.4%

+114.4%

Average

+87.0%

+87.0%

Run Time

-1 s.d.

-23.5%

-31.6%

+1 s.d.

+36.1%

+14.7%

Average

+2.1%

-11.4%

Elapsed Time

-1 s.d.

-18.2%

-30.0%

+1 s.d.

+40.1%

+17.0%

Average

+7.0%

-9.5%

Mutator Time

-1 s.d.

-32.4%

-38.8%

+1 s.d.

+20.1%

+3.0%

Average

-9.9%

-20.6%

Mutator Elapsed Time

-1 s.d.

-28.7%

-37.9%

+1 s.d.

+22.5%

+4.4%

Average

-6.6%

-19.5%

GC Time

-1 s.d.

+4.5%

-11.9%

+1 s.d.

+74.8%

+54.1%

Average

+35.2%

+16.5%

GC Elapsed Time

-1 s.d.

+7.9%

-8.0%

+1 s.d.

+75.1%

+56.7%

Average

+37.4%

+20.0%

Total Memory in use

-1 s.d.

-1.7%

-1.9%

+1 s.d.

+88.9%

+88.9%

Average

+36.3%

+36.1%

Compile Times

-1 s.d.

+11.9%

-8.9%

+1 s.d.

+21.1%

+2.9%

Average

+16.4%

-3.1%

Implications of the performance difference

If GHCi uses dynamic libraries by default, then ghci will need to be dynamically linked. It would make sense to therefore also have ghc be dynamically linked. This means that any performance difference will also affect the performance of the compiler (this is already accounted for in the "Compile Times" in the nofib results).

It would still be possible to compile programs using the "static way" by giving ghc the -static flag, and users would be able to configure cabal-install to do so by default if they wish. Then programs would be exactly the same as they are today. However, this would have the drawback that cabal-install would need to be configured to install libraries for the static way as well as the dynamic way, so library installation would take twice as long.

Questions

In summary, we need to answer the following questions:

Should we enable dynamic by default on OS X x86_64?

Should we enable dynamic by default on OS X x86?

Should we enable dynamic by default on Linux x86_64?

Should we enable dynamic by default on Linux x86?

Should we enable dynamic by default on Windows x86_64?

Should we enable dynamic by default on Windows x86?

Should we enable dynamic by default on other platforms?

For platforms using dynamic by default, should Cabal also install static libraries by default?

For 1 and 3, the performance impact appears negligible and some bugs will be fixed, so we would suggest that the answer should be yes.