Introduction

What is the performance difference between C# and native C++? C# is much quicker as a development environment to make usable utility apps, but it is generally accepted that for true performance, C++ is the way to go. This article investigates that premise, and I expect the additional effort to develop in C++ yield significant runtime performance benefits.

Disclaimer

The program is not representative of a typical application; it is meant to be a computationally intensive test case.

I opted for testing all the default release settings, which hampers C++ more than C#. This will be addressed in part 2.

There's a disproportionate amount of time spent in the square root library function; I'll change this to an approximation in part 2 (e.g. Carmack's approach), or remove it as it's not strictly required.

The code is native C++ talking to Win32, not CLR C++.

None of the rendering or window handling is timed; just the Mandelbrot creation.

Using the Code

I used Visual Studio 2013 for this project, but there's nothing special in the projects that should stop it running in Visual Studio 2012. The C# project is compiled against .NET4.5, and the C++ has all the default optimization settings. The code in both languages does the following:

Creates a simple borderless window

Generates a 640x640x256 Mandelbrot set 20 times

Displays the last generated result as the background image of the window (to make sure it is calculated correctly)

Waits for a mouse click

Displays a message box containing the average time taken to generate a Mandelbrot set in milliseconds

Closes

The code is not meant to be optimal, it is meant to be as simple and performance stressing as possible. The actual Mandelbrot routine itself is identical in both C++ and C#.

The window width and height, the number of iterations, and whether to use floats or doubles is easily configurable with a cursory glance at the code.

The size of the Mandelbrot image is less than the size of the processor with the smallest L2 cache to try to avoid any cache complexity.

There are three configurations to compile for the C# project - Any CPU, Win2, and x64

There are two configurations to compile for the C++ project - Win32 and x64

The projects are configured to output a different executable depending on the configuration, so the previous five binaries will not overwrite each other.

I then ran all five executables with both single and double precision settings on five different machines to gauge performance. I also tried on my Surface Pro 1 (Core i5), and my Server 2012 (Core i3) machines, but the results were too inconsistent.

All the machines run Windows Update, so all had .NET4.5 installed and ran the C# versions out of the box. Only my machine had the VS2013 CRT redist installed, so I had to install those on all other machines.

Caveats:

The machines were not 'clean' (they had background processes running, it wasn't a clean OS install, etc.); they all are machines that at in use. However, all the tests were done at the same time, so they were all equally 'unclean'.

The tests were run several times to validate the results; they were normally within 2%-3%.

Results

The times are the average number of milliseconds taken to create the Mandelbrot set, so lower is quicker.

Xeon E5-1620 v2 @ 3.7GHz (L2:10)

Celeron 450 @ 2.2GHz (L2:.5)

Xeon E31245 @ 3.3GHz (L2:8)

Celeron 743 @ 1.3GHz (L2:1)

Xeon E5420 @ 2.5GHz (L2:12)

AnyCPU

x86

x64

AnyCPU

x86

x64

AnyCPU

x86

x64

AnyCPU

x86

x64

AnyCPU

x86

x64

Precision

Language

640x640

Float

C#

105

107

122

790

791

579

136

136

135

363

363

346

186

185

176

C++

150

116

874

288

158

117

933

370

473

188

Double

C#

98

97

121

570

570

581

136

135

135

331

331

346

168

169

177

C++

137

135

848

584

147

136

894

391

450

198

Points of Interest

The results are not what I expected at all, even with this very synthetic test.

The C++ x64 was always quicker than its x86 equivalent, and sometimes very much quicker.

The double precision version was slightly quicker overall, except in the case of C++ x64 code.

The AnyCPU config has the 'Prefer x86' setting, so I would expect it to perform the same as the Win32 version. This is indeed the case.

C# was quickest overall in 80% of the tests, and the AnyCPU config beat both C++ configurations in all of those cases (even though the C# x64 was quickest in two cases).

The Xeon processors had performance proportional to their clock speeds. This would be an obvious expected result.

The Celeron 450 at 2.2GHz ran C# significantly more slowly than the Celeron 743 at 1.3GHz. This is very confusing.

I did further comparisons of the Celeron processors with smaller Mandelbrot sets.

Results

Celeron 450 @ 2.2GHz (L2:.5)

Celeron 743 @ 1.3GHz (L2:1)

400x400

Float

C#

309

358

225

142

145

135

C++

343

114

359

145

Double

C#

226

225

225

128

128

135

C++

336

231

356

153

200x200

Float

C#

77

77

56

35

35

33

C++

86

28

89

37

Double

C#

55

55

56

32

32

34

C++

84

58

89

39

Points of Interest

C# was consistently about twice as quick on the 1.3GHz machine than the 2.2GHz machine.

The x64 C++ was always quicker than the x86 C++.

Double precision x86 C++ tends to be a fraction quicker than the single precision. This is consistent with the previous tests.

Double precision x64 C++ tends to be slower than the single precision. This is also consistent with the previous tests.

Conclusions

This does not show that C# is quicker than C++, there isn't a wide enough sampling of processors, the test is not generic enough, and the testing procedure too ad hoc. However, it does show that C# is potentially a performance competitor and if different tests show similar results, C++ will become even less desirable to use.

Additional data points and/or critique would be greatly appreciated!

Further Work

Experiment with C++ optimization settings to see what difference they make (will be addressed in part 2)

Explain why the Penryn Celeron outperforms the Conroe Celeron that has nearly twice the processor speed.

I would like to see the results of the C++ run with full optimization during compile for a fairer comparison. The C# JIT optimizes the code - both at initial JIT compilation to machine code, and potential further optimizations as the code runs. So it seems to me that to perform a valid speed comparison, the C++ .exe should be fully optimized (in addition to using library routines that are not forced to do data conversions on top of the actual computations...C# will avoid these conversions).

for comparison performance using an Altera cyclone III touch screen development kit, it evaluates frames per second comparing between micro-controller and FPGA implementations. one of the tests does use the Mandelbrot set comparison between the on board NIOS II processor and the FPGA.

results indicate that FPGA runs same test about 60 times faster than the processor can perform. This is due to the fact that the FPGA takes advantage of the parallel computing capabilties that the NIOS II processor simply does not have.

Modern cellphones do contain FPGAs and processors as well so that folks can navigate the internet that way.

I think that all PCs should take the same route. instead of multiple parallel processors, there should be a single, or two processors and multiple FPGAs.

John - this is an interesting comparison, but the way is done will not really tell you much about the speed difference between C++ and C## in a real application like a game engine.

First, I assume you are not including any window creation, graphic...etc ops in your times.

You are doing a simple fractal generation loop. The C## compiler will generate a rather simple math microcode for that. The CRT will take the microcode and compile it to native binary using the same Microsoft component that their C++ compiler uses to translate C++ into binary. So, in that case you are getting "better" binary from C## than from C++ because the CRT will automatically optimize the binary for many of the tricks available on the host platform.

This is probably the reason why you get counter-intuitive results on different processors (the other being memory delays from the data being fetched to the ALU). On some platforms, the CLR compiler may be doing a poor job at generating the binary.

I can assure you that in a complex application - like our game engine, which exists in C++ and C## versions, the C++ implementation is an order of magnitude faster. Part of this is due to the fact we are better at C++ than C##, but part is genuine...

I think that your method for comparing languages C# and C++ is not correct.
To compare two languages correctly you must use only plain language constructions. You can't use library functions like math lib and/or graphics lib. For example, authors of C# graphics might optimize it for windows, and authors of C++ might not. Besides, both libs might use Windows functions that are pure C.
If you are comparing two programs, the result always might be disputed.

Generally, in development of embedded applications the language of choice is C or C++ without virtual functions.

Your example has several issues.
First, you ignored the floating point optimization model in C++. If you change to /fp:fast the C++ code will run faster.
Second, one of the most expensive calculation is performed in a library, not in the benchmarking code (sqrt).
And third, you used sqrt for float, not sqrtf.
Besides all that, your example doesn't even touch the weaknesses of C#.

I've read the debate threads about the size of doubles and using this optimization or that, but it's critical to remember that C# IL can and does get converted to x86 native code on a JIT basis. There are cases where C# can achieve C++ speeds for computation, so I'm not prepared to get nitpicky over this. (I write in both languages, as well as in Java.) C# and C++ are aimed at different operating environments (I don't know that I'd for sure peg C# for system-level programming, nor would I try to write a Python interpreter in it--but hang on, maybe I would). Some things are just easier in C++ and some are easier in C#, and this isn't the venue for religious arguments.

Might be the tests could stand a little polishing and bolt-tightening, but it's still a pretty darned interesting thing to look at. Wouldn't be the first time I'd seen someone use C# for graphical work or image processing with pretty good performance. (Dr. Michael Covington, formerly a professor at UGA, has done this sort of thing before and posted about it on his blog.)

Hey, John... I appreciate your intent and the result is interesting, but there are so many variables that it's really impossible to compare. Just to address the C++ side, you have to consider how you're linking (I didn't examine the project) to the libraries -- static or dynamic, what libraries you're using, whether you're performing stack checks, not to mention the use of the C++ GDI library... and so on. I would be willing to bet that you could get results that vary by at least a factor of two with either the C# code or C++ code, maybe a factor of 10 -- who knows. But thanks for the effort.

I'm not sure of this, but aren't C# and C++ using the same compiler? Also, since the Intel I-family (and clones) encompass the equivalent of the 8087 math co-processor which is a native 8 byte device, shouldn't we expect double precision to be faster single precision? Finally, to have a "real" benchmark, you need to test code generation, not the library, as the library is often built with hand-tweaked assembler. Therefore, the test should avoid using canned library routines as much as possible.

Nice job, John.
One more confirmation of well done job of C# team.
Many people have tryied to realise dynamic collections faster than in C# libraries any many other trics - but C# realisation still the best.
Your test still prove that real numbers computing is highly effective in C#.
//
Now i am testing evolutionary opmizations, evolutionary expressions programming (in general purpose meaning of artificial intelligence) and have found some interesting technics based on caching objects.
For example simple loop of genes computing in chromosome is 2 times slower if you just call methods than their cached objects.
// For example:

for ( int i = 0; Chromosome.Length;++i)
Chromosome.Gene[i].ComputeResult
// where //Chromosome.Gene[i].ComputeResult//should be relaced by delegate call

//
I agree it is easier to realise knowledge using C# than C++ and .Net environment does a lot of additional job to simplify developer life.
There are many additional tricks that can simplify any job in C# and make you application run very fast.
Btw Richter some years ago in MSDN blog was describing similar conlusions. Seems it was Net 3.0 - 3.5 appearing.

//
I am interested in real time maching learning on financial markets so I have found that even High Frequency Algorithms are often based on C# (Let's read Irene Aldridge).
C and C++ cant be avoided if you want to use GPU (even GPGPU) for sure.
But i still sure that many computations may be improved using C# before any GPU using.
For example i need to draw more than 20.000 elements updated at least in 20 ms after some statistical and machine learning computing - trust my WPF application doesnt experience any problem, GUI still responsive.
Mathematicians love to show super results of Neural Networks on GPU.
But there are some methods of machine learning much faster (200-400 time in my experiments) than any NN with near accuracy.
They are better to realise in plain C# without any unsafe code.
And additional plus - OOP and good maintaning and development.
Ok if you really need super speed -> let's use at least C++ AMP code through PInvoke - No probs. But better think twice - is it really needed?

I taught and wrote two flight simulators in assembly (80x86 and 6809) 32 years ago, including the very first simulator (that I knew about) with shading. By the time I was letting other use this I found myself (on the 80x86 version) using QuickBasic to provide some interface. I modded the compiler to support higher graphics and added my own library for a mouse interface (those were pretty new at the time).

What impressed me about QuickBasic was how fast it ran. Digging into the compiled QB code, I found that I couldn't write it much faster in straight assembly!!! Other than occasional indirect register references that code was TIGHT!

So in response to all the naysayers, especially those who complain about the libraries, I'd like to point out that those libraries are almost certainly written in C++ and, especially for "sqrt" and other such common functions, if you think you can write it faster in straight C++... maybe you can... but you'd better pack your lunch cause you're going to be there a while.

As a test of "is it worth it for C++". This is great! When you need some unmanaged coding... perhaps ...but if you think speed is so critical that it must be in C++, well I'd think you may as well marshal some memory and do assembly on that mem block rather than C++ it.

...just one opinion from a guy who spent the first half of his software career tweaking code for speed.

I don't know that all of this is valid. Single precision calculations should be just as fast as double precision and in some cases faster (32 bit lacking SIMD instruction for single cycle 64 bit multiply)