Performance Improvements in .NET Core

There are many exciting aspects to .NET Core (open source, cross platform, x-copy deployable, etc.) that have been covered in posts on this blog before. To me, though, one of the most exciting aspects of .NET Core is performance. There’s been a lot of discussion about the significant advancements that have been made in ASP.NET Core performance, its status as a top contender on various TechEmpower benchmarks, and the continual advancements being made in pushing it further. However, there’s been much less discussion about some equally exciting improvements throughout the runtime and the base class libraries.

There are way too many improvements to mention. After all, as an open source project that’s very accepting of contributions, Microsoft and community developers from around the world have found places where performance is important to them and submitted pull requests to improve things. I’d like to thank all the community developers for their .NET Core contributions, some of which are specifically called out in this post. We expect that many of these improvements will be brought to the .NET Framework over the next few releases, too. For this post, I’ll provide a tour through just a small smattering of the performance improvements you’ll find in .NET Core, and in particular in .NET Core 2.0, focusing on a few examples from a variety of the core libraries.

NOTE: This blog post contains lots of example code and timings. As with any such timings, take them with a grain of salt: these were taken on one machine in one configuration (all 64-bit processes), and so you may see different results on different systems. However, I ran each test on .NET Framework 4.7 and .NET Core 2.0 on the same machine in the same configuration at approximately the same time, providing a consistent environment for each comparison. Further, normally such testing is best done with a tool like BenchmarkDotNet; I’ve not done so for this post simply to make it easy for you to copy-and-paste the samples out into a console app and try them.

Collections

Collections are the bedrock of any application, and there are a multitude of collections available in the .NET libraries. Not every operation on every collection has been made faster, but many have. Some of these improvements are due to eliminating overheads, such as streamlining operations to enable better inlining, reducing instruction count, and so on. For example, consider this small example with a Queue<T>:

PR dotnet/corefx #2515 from OmariO removed from Enqueue and Dequeue a relatively expensive modulus operation that dominated the costs of these operations. On my machine, this code on .NET 4.7 produces output like this:

As this is “wall clock” time elapsed, smaller values are better, and this shows an ~2x increase in throughput!

In other cases, operations have been made faster by changing the algorithmic complexity of an operation. It’s often best when writing software to first write a simple implementation, one that’s easily maintained and easily proven correct. However, such implementations often don’t exhibit the best possible performance, and it’s not until a specific scenario comes along that drives a need to improve performance does that happen. For example, SortedSet<T>‘s ctor was originally written in a relatively simple way that didn’t scale well due to (I assume accidentally) employing an O(N^2) algorithm for handling duplicates. The algorithm was fixed in .NET Core in PR dotnet/corefx #1955. The following short program exemplifies the difference the fix made:

On my system, on .NET Framework this code takes ~7.7 seconds to execute. On .NET Core 2.0, that is reduced to ~0.013s, for an ~600x improvement (at least with 400K elements… as the fix changed the algorithmic complexity, the larger the set, the more the times will diverge).

The implementation of Min and Max in .NET 4.7 walks the whole tree underlying the SortedSet<T>, but that’s unnecessary for finding just the min or the max, as the implementation can traverse down to just the relevant node. PR dotnet/corefx #11968 fixes the .NET Core implementation to do just that. On .NET 4.7, this example produces results like:

To be sure, the fact that we can do 100 million such adds and removes from a list like this in just 0.3 seconds highlights that the operation wasn’t slow to begin with. But over the execution of an app, lists are often added to a lot, and the savings add up.

Obviously the ConcurrentQueue<T> example on .NET 4.7 is slower than the Queue<T> version on .NET 4.7, as ConcurrentQueue<T> needs to employ synchronization to ensure it can be used safely concurrently. But the more interesting comparison is what happens when we run the same code on .NET Core 2.0:

This shows that the throughput using ConcurrentQueue<T> without any concurrency improves when switching to .NET Core 2.0 by ~30%. But there are even more interesting aspects. The changes in the implementation improved serialized throughput, but even more so reduced the synchronization between producers and consumers using the queue, which can have a more demonstrable impact on throughput. Consider the following code instead:

This example is spawing a consumer that sits in a tight loop dequeueing any elements it can find, until it consumes everything the producer adds. On .NET 4.7, this outputs results on my machine like the following:

That’s an ~3.5x throughput increase. But better CPU efficiency isn’t the only impact of the rewrite; memory allocation is also substantially decreased. Consider a small variation to the original test, this time looking at the number of GC collections instead of the wall-clock time:

That’s not a typo: 0 collections. The implementation in .NET 4.7 employs a linked list of fixed-size arrays that are thrown away once the fixed number of elements are added to each; this helps to simplify the implementation, but results in lots of garbage being generated for the segments. In .NET Core 2.0, the new implementation still employs a linked list of segments, but these segments increase in size as new segments are added, and more importantly, utilize circular buffers, such that new segments only need be added if the previous segment is entirely full (though other operations on the collection, such as enumeration, can also cause the current segments to be frozen and force new segments to be created in the future). Such reductions in allocation can have a sizeable impact on the overall performance of an application.

Similar improvements surface with ConcurrentBag<T>. ConcurrentBag<T> maintains thread-local work-stealing queues, such that every thread that adds to the bag has its own queue. In .NET 4.7, these queues are implemented as linked lists of one node per element, which means that any addition to the bag incurs an allocation. In .NET Core 2.0, these queues are now arrays, which means that other than the amortized costs involved in growing the arrays, additions are allocation-free. This can be seen in the following repro:

That’s an ~30% improvement in throughput, and a huge (complete) reduction in allocations and resulting garbage collections.

LINQ

In application code, collections often go hand-in-hand with Language Integrated Query (LINQ), which has seen even more improvements. Many of the operators in LINQ have been entirely rewritten for .NET Core in order to reduce the number and size of allocations, reduce algorithmic complexity, and generally eliminate unnecessary work.

For example, the Enumerable.Concat method is used to create a single IEnumerable<T> that first yields all of the elements of one enumerable and then all the elements of a second. Its implementation in .NET 4.7 is simple and easy to understand, reflecting exactly this statement of behavior:

This is about as good as you can expect when the two sequences are simple enumerables like those produced by an iterator in C#. But what if application code instead had code like the following?

Every time we yield out of an iterator, we return out of the enumerator’s MoveNext method. That means if you yield an element from enumerating another iterator, you’re returning out of two MoveNext methods, and moving to the next element requires calling back into both of those MoveNext methods. The more enumerators you need to call into, the longer the operation takes, especially since every one of those operations involves multiple interface calls (MoveNext and Current). That means that concatenating multiple enumerables grows exponentially rather than linearly with the number of enumerables involved. PR dotnet/corefx #6131 fixed that, and the difference is obvious in the following example, which concatenates 10K enumerables of 10 elements each:

On my machine on .NET 4.7, this takes ~4.12 seconds. On my machine on .NET Core 2.0, this takes only ~0.14 seconds, for an ~30x improvement.

Other operators have been improved substantially by eliminating overheads involved when various operators are used together. For example, a multitude of PRs from JonHanna have gone into optimizing various such cases and into making it easier to add more cases in the future. Consider this example:

Here we create an enumerable of the numbers 10,000,000 down to 0, and then time how long it takes to sort them ascending, skip the first 4 elements of the sorted result, and grab the fifth one (which will be 4, as the sequence starts at 0). On my machine on .NET 4.7, I get output like:

That’s a sizeable improvement (~8x), in this case due primarily (though not exclusively) to PR dotnet/corefx #2401, which avoids most of the costs of the sort.

Similarly, PR dotnet/corefx #3429 from justinvp added optimizations around the common ToList method, providing optimized paths for when the source had a known length, and plumbing that through operators like Select. The impact of this is evident in a simple test like the following:

In other cases, the performance wins have come from streamlining the implementation to avoid overheads, such as reducing allocations, avoiding delegate allocations, avoiding interface calls, minimizing field reads and writes, avoiding copies, and so on. For example, jamesqo contributed PR dotnet/corefx #11208, which substantially reduced overheads involved in Enumerable.ToArray, in particular by better managing how the internal buffer(s) used grow to accommodate the unknown amount of data being aggregated. To see this, consider this simple example:

There are over a hundred operators in LINQ, and while I’ve only mentioned a few, many of them have been subject to these kinds of improvements.

Compression

The examples shown thus far, of collections and LINQ, have been about manipulating data in memory. There are of course many other forms of data manipulation, including transformations that are heavily CPU-bound in nature. Investments have also been made in improving such operations.

One key example is compression, such as with DeflateStream, and several impactful performance changes have gone in here. For example, in .NET 4.7, zlib (a native compression library) is used for compressing data, but a relatively unoptimized managed implementation is used for decompressing data; PR dotnet/corefx #2906 added .NET Core support for using zlib for decompression as well. And PR dotnet/corefx #5674 from bjjones enabled using a more optimized version of zlib produced by Intel. These combine to a fairly dramatic effect. Consider this example, which just creates a large array of (fairly compressible) data:

On .NET 4.7, for this one compression/decompression operation I get results like:

00:00:00.7977190

whereas with .NET Core 2.0, I get results like:

00:00:00.1926701

Cryptography

Another common source of compute in a .NET application is the use of cryptographic operations. Improvements can be seen here as well. For example, in .NET 4.7, SHA256.Create returns a SHA256 type implemented in managed code, and while managed code can be made to run very fast, for very compute-bound computations it’s still hard to compete with the raw throughput and compiler optimizations available to code written in C/C++. In contrast, for .NET Core 2.0, SHA256.Create returns an implementation based on the underlying operating system, e.g. using CNG on Windows or OpenSSL on Unix. The impact can be seen in this simple example that hashes a 100MB byte array:

On .NET 4.7, I get:

00:00:00.7576808

whereas with .NET Core 2.0, I get:

00:00:00.4032290

Another nice improvement for zero code changes.

Math

Mathematical operations are also a large source of computation, especially when dealing with large numbers. Through PRs like dotnet/corefx #2182, axelheer made some substantial improvements to various operations on BigInteger. Consider the following example:

On my machine on .NET 4.7, this outputs results like:

00:00:05.6024158

The same code on .NET Core 2.0 instead outputs results like:

00:00:01.2707089

This is another great example of a developer caring a lot about a particular area of .NET and helping to make it better for their own needs and for everyone else that might be using it.

Even some math operations on core integral types have been improved. For example, consider:

Serialization

Binary serialization is another area of .NET that can be fairly CPU/data/memory intensive. BinaryFormatter is a component that was initially left out of .NET Core, but it reappears in .NET Core 2.0 in support of existing code that needs it (in general, other forms of serialization are recommended for new code). The component is almost an identical port of the code from .NET 4.7, with the exception of tactical fixes that have been made to it since, in particular around performance. For example, PR dotnet/corefx #17949 is a one-line fix that increases the maximum size that a particular array is allowed to grow to, but that one change can have a substantial impact on throughput, by allowing for an O(N) algorithm to operate for much longer than it previously would have before switching to an O(N^2) algorithm. This is evident in the following code example:

On .NET 4.7, this code outputs results like:

76.677144

whereas on .NET Core 2.0, it outputs results like:

6.4044694

showing an ~12x throughput improvement for this case. In other words, it’s able to deal with much larger serialized inputs more efficiently.

Text Processing

Another very common form of computation in .NET applications is the processing of text, and a large number of improvements have gone in here, at various levels of the stack.

Consider Regex. This type is commonly used to validate and parse data from input text. Here’s an example that uses Regex.IsMatch to repeatedly match phone numbers:

On my machine on .NET 4.7, I get results like:

Elapsed=00:00:05.4367262 Gen0=820 Gen1=0 Gen2=0

whereas with .NET Core 2.0, I get results like:

Elapsed=00:00:04.0231373 Gen0=248

That’s an ~25% improvement in throughput and an ~70% reduction in allocation / garbage collections, due to a small change in PR dotnet/corefx #231 that made a fix to how some data is cached.

Another example of text processing is in various forms of encoding and decoding, such as URL decoding via WebUtility.UrlDecode. It’s often the case in decoding methods like this one that the input doesn’t actually need any decoding, but the input is still passed through the decoder in case it does. Thanks to PR dotnet/corefx #7671 from hughbe, this case has been optimized. So, for example, with this program:

on .NET 4.7, I see the following output:

Elapsed=00:00:01.6742583 Gen0=648

whereas on .NET Core 2.0, I see this output:

Elapsed=00:00:01.2255288 Gen0=133

Other forms of encoding and decoding have also been improved. For example, dotnet/coreclr #10124 optimized the loops involved in using some of the built-in Encoding-derived types. So, for example, this code that repeatedly encodes an ASCII input string as UTF8:

These kinds of improvements extend as well to general Parse and ToString methods in .NET for converting between strings and other representations. For example, it’s fairly common to use enums to represent various kinds of state, and to use Enum.Parse to parse a string into a corresponding Enum. PR dotnet/coreclr #2933 helped to improve this. Consider the following code:

That’s an almost 3x increase in throughput and a whopping ~90% reduction in allocations / garbage collections.

Of course, there’s lots of custom text processing done in .NET applications, beyond using built in types like Regex/Encoding and built-in operations like Parse and ToString, often built directly on top of string, and lots of improvements have gone into operations on String itself.

It’s quite fun looking through all of the changes that have gone into String, seeing their impact, and thinking about the additional possibilities for more improvements.

File System

Thus far I’ve been focusing on various improvements around manipulating data in memory. But lots of the changes that have gone into .NET Core have been about I/O.

Let’s start with files. Here’s an example of asynchronously reading all of the data from one file and writing it to another (using FileStreams configured to use async I/O):

A bunch of PRs have gone into reducing the overheads involved in FileStream, such as dotnet/corefx #11569 which adds a specialized CopyToAsync implementation, and dotnet/corefx #2929 which improves how asynchronous writes are handled, and so when running this on .NET 4.7 I get results like:

Elapsed=00:00:09.4070345 Gen0=14 Gen1=7 Gen2=1

and on .NET Core 2.0, results like:

Elapsed=00:00:06.4286604 Gen0=4 Gen1=1 Gen2=1

Networking

Networking is a big area of focus now, and likely will be even more so moving forward. A good amount of effort is being applied to optimizing and tuning the lower-levels of the networking stack, so that higher-level components can be built efficiently.

One such change that has a big impact is PR dotnet/corefx #15141. SocketAsyncEventArgs is at the center of a bunch of asynchronous operations on Socket, and it supports a synchronous completion model whereby asynchronous operations that actually complete synchronously can avoid costs associated with asynchronous completions. However, the implementation in .NET 4.7 only ever synchronously completes operations that fail; the aforementioned PR fixed the implementation to allow for synchronous completions of all async operations on sockets. The impact of this is very obvious in code like the following:

This program creates two connected sockets, and then writes 1,000,000 times to one socket and receives on the other, in both cases using asynchronous methods but where the vast majority (if not all) of the operations will complete synchronously. On .NET 4.7 I see results like:

Elapsed=00:00:20.5272910 Gen0=42 Gen1=2 Gen2=0

whereas on .NET Core 2.0 with most of these operations able to complete synchronously, I see results instead like:

Elapsed=00:00:05.6197060 Gen0=0 Gen1=0 Gen2=0

Not only do such improvements accrue to components using sockets directly, but also to using sockets indirectly via higher-level components, and other PRs have resulted in additional performance increases in higher-level components, such as NetworkStream. For example, PR dotnet/corefx #16502 re-implemented Socket’s Task-based SendAsync and ReceiveAsync operations on top of SocketAsyncEventArgs and then allowed those to be used from NetworkStream.Read/WriteAsync, and PR dotnet/corefx #12664 added a specialized CopyToAsync override to support more efficiently reading the data from a NetworkStream and copying it out to some other stream. Those changes have a very measurable impact on NetworkStream throughput and allocations. Consider this example:

As with the previous Sockets one, we’re creating two connected sockets. We’re then wrapping those in NetworkStreams. On one of the streams we write 1K of data a million times, and on the other stream we read out all of its data via a CopyToAsync operation. On .NET 4.7, I get output like the following:

Elapsed=00:00:24.7827947 Gen0=220 Gen1=3 Gen2=0

whereas on .NET Core 2.0, the time is cut by 5x, and garbage collections are reduced effectively to zero:

Elapsed=00:00:04.9452962 Gen0=0 Gen1=0 Gen2=0

Further optimizations have gone into other networking-related components. For example, SslStream is often wrapped around a NetworkStream in order to add SSL to a connection. We can see the impact of these changes as well as others in an example like the following, which just adds usage of SslStream on top of the previous NetworkStream example:

On .NET 4.7, I get results like the following:

Elapsed=00:00:21.1171962 Gen0=470 Gen1=3 Gen2=1

.NET Core 2.0 includes changes from PRs like dotnet/corefx #12935 and dotnet/corefx #13274, both of which together significantly reduce the allocations involved in using SslStream. When running the same code on .NET Core 2.0, I get results like the following:

Elapsed=00:00:05.6456073 Gen0=74 Gen1=0 Gen2=0

That’s 85% of the garbage collections removed!

Concurrency

Not to be left out, lots of improvements have gone into infrastructure and primitives related to concurrency and parallelism.

One of the key focuses here has been the ThreadPool, which is at the heart of the execution of many .NET apps. For example, PR dotnet/coreclr #3157 reduced the sizes of some of the objects involved in QueueUserWorkItem, and PR dotnet/coreclr #9234 used the previously mentioned rewrite of ConcurrentQueue<T> to replace the global queue of the ThreadPool with one that involves less synchronization and less allocation. The net result in visible in an example like the following:

That’s both a huge improvement in throughput and a huge reduction in garbage collections for such a core component.

Synchronization primitives have also gotten a boost in .NET Core. For example, SpinLock is often used by low-level concurrent code trying either to avoid allocating lock objects or minimize the time it takes to acquire a rarely contended lock, and its TryEnter method is often called with a value of 0 in order to only take the lock if it can be taken immediately, or else fail immediately if it can’t, without any spinning. PR dotnet/coreclr #6952 improved that fail fast path, as is evident in the following test:

Such an ~6x difference in throughput can make a significant impact on hot paths that exercise such locks.

That’s just one example of many. Another is around Lazy<T>, which was rewritten in PR dotnet/coreclr #8963 by manofstick to improve the efficiency of accessing an already initialized Lazy<T> (while the performance of accessing a Lazy<T> for the first time matters, the expectation is that it’s accessed many times after that, and thus we want to minimize the cost of those subsequent accesses). The effect is visible in a small example like the following:

What’s Next

As I noted earlier, these are just a few of the many performance-related improvements that have gone into .NET Core. Search for “perf” or “performance” in pull requests in the dotnet/corefx and dotnet/coreclr repos, and you’ll find close to a thousand merged PRs; some of them are big and impactful on their own, while others whittle away at costs across the libraries and runtime, changes that add up to applications running faster on .NET Core. Hopefully subsequent blog posts will highlight additional performance improvements, including those in the runtime, of which there have been many but which I haven’t covered here.

We’re far from done, though. Many of the performance-related changes up until this point have been mostly ad-hoc, opportunistic changes, or those driven by specific needs that resulted from profiling specific higher-level applications and scenarios. Many have also come from the community, with developers everywhere finding and fixing issues important to them. Moving forward, performance will be a bigger focus, both in terms of adding additional performance-focused APIs (you can see experimentation with such APIs in the dotnet/corefxlab repo) and in terms of improving the performance of the existing libraries.

To me, though, the most exciting part is this: you can help make all of this even better. Throughout this post I highlighted some of the many great contributions from the community, and I highly encourage everyone reading this to dig in to the .NET Core codebase, find bottlenecks impacting your own apps and libraries, and submit PRs to fix them. Rather than stumbling upon a performance issue and working around it in your app, fix it for you and everyone else to consume. We are all very excited to work with you on bringing such improvements into the code base, and we hope to see all of you involved in the various .NET Core repos.

.NET Standard doesn’t make a statement on performance. You are right that it would be odd if performance changes in .NET Core were breaking w/rt .NET Framework. That said, breaks in .NET Framework that only affect 0.001% of customers are still a major problem due to the .NET Framework deployment model (in-place updates and Windows Update delivery). So, there are a class of changes that we’ll take in .NET Core and not .NET Framework. The .NET Core side-by-side deployment model gives us a greater degree of freedom.

Folks are also working to merge more and more of corefx into Mono (and vice versa where applicable), such that wherever possible we use the same codebase across them. Then improvements to one accrue to the other.

Just as a side remark, there’s a nice tool Benchmark.NET which not only displays the benchmarks neatly but also summarizes the average/min/max and shows graphs so readers don’t have to skim 5 lines of 00:00….

First, this is a really great article and another gem by Mr. Toub. Also always glad to see Mr. Lander really owning the comments, too. I read the disclaimer against Benchmark.NET and was really surprised, as this is pretty common these days when benchmarking is viewed in .NET articles. It doesn’t seem right without it. BDN is also part of the .NET Foundation, so that adds to the surprise.

I feel bad for Stephen saying these things but, the reward for hard work is more hard work. 🙂 If my own personal development environment wasn’t such a disaster (virtualizing everything in Hyper-V and standing up VS2017, FINALLY!) I would be all over getting you some BDN magic for you, for sure. Maybe if no one else has done in the next few days or so when I am back on my feet.

No need to feel bad. Thanks for the feedback. I actually started with Benchmark.NET, since I often use it locally, and then made the explicit choice to switch to console apps for all of the examples, as I felt it made for a better reading/trying-it-out experience. That’s why I added the note at the beginning, highlighting that a tool like Benchmark.NET (or other benchmarking tools) is the way to go for actually doing performance work. For the purposes of this post, I just wanted simple snippets, where you didn’t need to acquire anything beyond the runtime/compiler, didn’t need to understand what the boilerplate meant, etc. I’ll keep the feedback in mind for the future, though; maybe I put too much value in those criteria.

It is a shame that performance tests are always left as an after thought. Like test driven development, performance tests should be written at the start. More importantly there should be a *standard* of performance tests defined against interfaces, which would make it trivial for implementations to be run against each other to see which one is faster. In other words performance tests should be written once, in such a way that implementors can immediately test their implementations and easily compare their results against others.

Great article and thank you for sharing these improvements. I was looking for similar information for a while, but could not find anything tangible. I was hoping to see some architectural (performance) changes that could have helped reduce memory pressure and/or improve cpu utilization etc, unfortunately it seems like there are none. It is sad to see huge gaps in performance today after 10-30 years of existence (depending on dll/method) and these gaps are being addressed by the community and not Microsoft. IMO it just proves that performance was never important and always an afterthought.
I’m excited about and looking forward to new features (like Span, ref fields etc) but these are just bolted on features It would be nice to see something done under the hood. I’m glad to hear that now performance may become as important as language changes and new libs.
I was hoping .net core will become smaller and significantly faster successor (to full .Net), instead it seems like focus has shifted to just make it cross platform highly compatible with full .Net and all the legacy code retained.

What led you to this conclusion? It is false. There were many performance improvements to choose from. Stephen did a great job of picking a selection of both Microsoft and community contributions. He could have picked all his (of which there are MANY) and that would have been fine but he didn’t do that.

> I was hoping to see some architectural (performance) changes that could have helped reduce memory pressure and/or improve cpu utilization etc, unfortunately it seems like there are none

Again, this is a selection of improvements. It’s possible that the title of the post was too broad (in hindsight). We have other performance investments that are more architectural in nature but they are not in this post.

> I’m glad to hear that now performance may become as important as language changes and new libs.

Performance has always been at least as important as language changes and libs, particularly for the runtime. For example, our GC investments over the last 10+ years have all been performance-oriented. Little secret … our GC changes have resulted in very non-trivial reductions in the cost of running Microsoft services. We have similar feedback from customers, too. This post demonstrates that we haven’t done enough performance work in the BCL. We intend to bring much of that value to the .NET Framework.

> I was hoping .net core will become smaller and significantly faster successor (to full .Net), instead it seems like focus has shifted to just make it cross platform highly compatible with full .Net and all the legacy code retained.

Looks like you read a different blog post. The point of this post was to demonstrate the opposite.

Love it. I developped a hobby of watching over perf pr being merged into kestrel. Not sure why, but it wery pleasant to see 30%-3000% perf increase, sort of meditation. Also it makes obvious how much work is required to write optimized code.

Wow. Quite the tome! This must’ve taken you forever to write, Stephen!

It’s amazing all these perf improvements were able to be achieved, especially in such core parts of the BCL! (Who would’ve thought IndexOf had room for further optimization?) Great job to the team and all the contributors!

While I don’t expect to ever see WinForms and WPF become a part of .NET Core, I really would like to see them extracted from .NET Framework so they’ll work WITH core when the target is known to be Windows. Ideally, the .NET framework should simply by the Windows build of .NET Core along with whatever additional modules are needed to fill out the non-portable API surface area of framework.

Great blog Mr. Toub! This work by MS teams and community are creating a bright story for .NET Core. I appreciate the work you have also done as I try to keep up with your GitHub progress.. along with asking myself when do you and others (@davidfowl, etc.) sleep?!?!

If I could have one request… when the new performance APIs like Span (and others from CoreFxLab) move into a released version of .NET Core/Standard, would it be possible to release a development series (e.g. Channel9, tutorials, etc.) explaining how/when to use the APIs effectively? I have current projects (and will have more when better tooling/platform support come for microservices and IoT) that could benefit using the performance APIs… just need a good overview for scenarios when to use (or not to use).

Thanks, OmariO. There’s still more opportunity for improvement in System.Net.Sockets, but we’ve collectively made great progress, and I think the results in .NET Core 2.0 will be very welcome to a large number of apps.

Asynchronicity, thanks for the feedback. Changing a reference type to a value type is a breaking change, so that’s not something I’d expect to see happen. There is also value (no pun intended) in having it be a class, in that it allows you to pass the instance around without concern for whether you’re accidentally copying and then mutating a copy rather than the original. But I agree it would be nice if there were a way to have similar functionality but without the allocation. If you have suggestions for how to best design that, please open an issue in the corefx repo. For example, Stopwatch currently provides a static GetTimestamp method, but interpreting the result of that isn’t super easy. We could add a method like `public static TimeSpan Stopwatch.GetElapsed(long timestamp)`, which would let you effectively start a stopwatch by calling GetTimestamp and stop it by passing the result to GetElapsed in order to get the elapsed time. That’s not necessarily the right answer, just an example of the kind of thing that could be done. We’ll look forward to hearing your suggestions in the corefx repo.

Due to the ticking of the stopwatch that can potentially slow things down (Fractionally), it’s generally considered better to use DateTime.Now before and after the benchmark and get difference between the two as a TimeSpan 🙂

Last has had a lot of work done on it for corefx as well; just one of the many sets of improvements I didn’t mention. That said, if you see ways these can be improved further, please open issues / submit PRs / etc. in the corefx repo. We look forward to your contributions!

Question, I’ve read the performance of 64bit vs 32bit is very different and encourages people to stick to 32bit…which is IMO, alarming considering 64bit computing is the future and clearly, few platforms ask developers for such compromises.

Wow! Great improvements! One question… It is known in the past that exception launching and “try catch finally” constructions are avoidable but, how it is at this moment, it has improved in .NET Core, it is so bad as it was told in some scenarios time ago?

Possible correction:
“That means that concatenating multiple enumerables grows exponentially rather than linearly with the number of enumerables involved”
I think it should become:
“That means that concatenating multiple enumerables grows quadratically rather than linearly with the number of enumerables involved”

This is fantastic, is there a step by step guide on how one would go about obtaining / changing / compiling / testing / and finally submitting a code change? I’m an experienced developer but I wouldn’t really know where to begin.

Our base libraries include highly-parallelized computational code, encryption code, compression code, and so on that would benefit heavily from these improvements. But our UI built on top of that is based on WinForms. So, is it possible to compile those base libraries on .NET Core 2.0 (to get those benefits) but still use them in a .NET Framework app?

Question: Do we have any timeline when these .net core perf improvements will trickle back to the full .net framework? I suppose someone is helping merge these back already. Desperate to see these improvements come back to enterprise (especially who use .net for GUI dev)