Supplemental to ASP.NET Project “Helios”

This is a supplemental document to my earlier Introducing ASP.NET Project “Helios” post. It contains extra information that might be of interest to the advanced developer but which didn’t make it into the main post. I encourage reading the original post before continuing.

On performance and resource consumption

When most web developers discuss performance, they’re thinking in terms of requests per second (RPS) throughput. As a rule of thumb, the lower you go in the stack, the more raw throughput you’re able to achieve. An application that opens a raw socket to listen for and process web requests will always outperform a higher-level framework like ASP.NET when it comes to “Hello World” scenarios. This is invariant. But let’s be honest: no real web application is a simple “Hello World” application. The goal of serving static content to visitors as quickly as possible is best served by a web cache than by a web server proper.

Real web applications perform non-trivial request processing. They hit databases and file systems. Perhaps they make calls to backend services. Attach a profiler to such an application, and you’ll see that the cost of the application logic and everything it calls dwarfs the ASP.NET runtime overhead. In my nearly seven years on the ASP.NET team, I have never once heard a customer complain that the ASP.NET runtime was simply too slow (in terms of throughput) for his needs when compared with other frameworks or hosts.

Throughput is just one aspect of performance measurements. A web server has a finite number of resources available to it: CPU, memory, hard drive space and I/O speed, and so on. Each request consumes some chunk of these shared resources, and the server must be mindful of how resources are allocated to each request. Administrators measure the resources required for each request and use this to make a determination of how many concurrent requests the server can handle while still meeting availability and reliability goals.

And once you calculate the number of concurrent requests per machine, how does one go about scaling the application up? The traditional way to do this in ASP.NET applications is to simply throw more hardware at the problem: build out a web farm. If the backend database is the bottleneck, cluster it out. Perhaps add a backend cache such as Redis. The particular course of action taken depends on the application.

When you think of improving performance as solving a resource allocation problem rather than as boosting throughput, you might be surprised where this train of logic leads. Let’s take a moment to consider one resource for now – memory.

Comparing memory usage of System.Web and Helios

To compare memory usage of a System.Web-based application versus a Helios-based application, we need a reference application. Any application based on OWIN is an ideal candidate for such a comparison. This allows us to leave the application code the same, so the only thing really changing between the runs is the underlying runtime.

Consider the following Web API controller whose Get() method simply holds the connection open while releasing the request thread. If we make several thousand requests to this application, this mimics an application processing many concurrent requests with long-running asynchronous operations.

Note: This is a very simple example. In practice you would use a realtime framework like SignalR to achieve this goal, but this simple example is still useful for determining the minimum amount of memory required to maintain a single persistent connection. In the case of SignalR running atop the System.Web OWIN host, the WebSocket transport generally consumes less memory than the other available transports, so actual per-request memory usage in that scenario may be lower than what is reported here.

In this test, I created a simple OWIN-based (via Microsoft.Owin.Host.SystemWeb) WebAPI application with the above controller. No other middleware was added, and I did not change ASP.NET configuration (other than increase the maximum allowed concurrent connection count from its default value). The web application was deployed to a 64-bit application pool in IIS 8.5 (Windows Server 2012 R2). I then hit this endpoint with 50,000 connections and monitored the # Bytes in all heaps performance counter for the w3wp.exe process. (I also forced garbage collections throughout the test to reclaim unreachable memory.)

The performance counter showed 1,480,856,008 allocated bytes in all heaps for the worker process. Divided by 50,000 requests, this gives an amortized overhead of 28.9 KiB per request. We can’t treat this number as absolutely golden when performing capacity planning exercises. For instance, it doesn’t account for any unmanaged per-request memory usage. But it can tell us a few things, such as that on a machine with 8 GB of RAM and running a 64-bit worker process, the amount of physical memory will become a bottleneck at the 300k concurrent request level or earlier.

The ASP.NET code paths are optimized so that they’re just background noise in throughput measurements, but they definitely show up when other resources like memory are considered. Developers pay for these features – even if those features are never used by the application. This is one of the tradeoffs of the “everything and the kitchen sink” mantra followed by the ASP.NET runtime.

We then reran the test with the Helios OWIN package installed. Installing the Microsoft.Owin.Host.IIS NuGet package is the only change we made to this project. In this new run, the performance counter showed 53,295,232 allocated bytes, which divided by 50,000 concurrent requests gives an amortized overhead of 1.04 KiB per request. Given that the System.Web overhead for this same test is 28.9 KiB per request, the Helios architecture provides a 96.4% reduction in per-request managed memory overhead compared with the full ASP.NET pipeline.

Memory overhead (amortized per request)

System.Web-based

Helios-based

Difference

28.9 KiB / request

1.04 KiB / request

-96.4%

Let’s put this another way. In absolute numbers, the Helios architecture allowed our sample application to achieve 50,000 concurrent requests with approximately 1 GB less overhead compared with the standard ASP.NET pipeline. And since the sample application was designed to be a minimum baseline, one can reasonably expect this same absolute number to apply to any non-trivial application as well.

And no. We’re not just pulling a sleight-of-hand and making unmanaged memory allocations in place of managed allocations. We’re not that sneaky. 🙂

Saving memory has beneficial ripple effects. Fewer page faults puts less pressure on the page file. Because there are fewer per-request managed allocations, there is also less pressure on the CLR garbage collector. Collections occur less frequently, and when they do occur they tend to complete much more quickly. In one of our internal “Hello World” performance runs (warning: unrealistic workload!), the full ASP.NET pipeline spent around 2.0% of its time performing garbage collection (see % Time in GC performance counter). That same application when Helios-hosted averaged 0.06% time in GC.

Using the Helios runtime without OWIN

The Helios runtime (Microsoft.AspNet.Loader.IIS.dll) is a standalone assembly and doesn’t have any direct integration with the OWIN pipeline. An application is free to use the APIs exposed by the Helios runtime directly rather than use the OWIN extensibility points provided by the Microsoft.Owin.Host.IIS package.

If you’d like to use the Helios APIs directly, follow these steps to get started:

In the New ASP.NET Project dialog, select the Empty template, then hit OK.

Install the Microsoft.AspNet.Loader.IIS NuGet package into the project. Do not install the Microsoft.Owin.Host.IIS package, otherwise the OWIN compatibility layer will initialize and you may see weird runtime behaviors due to multiple HttpApplicationBase instances being available.

Add a class which subclasses the Microsoft.AspNet.Loader.IIS.HttpApplicationBase type. At minimum, your derived type must override the ProcessRequestAsync method.

Add an assembly-level Microsoft.AspNet.Loader.IIS.HttpApplicationAttribute which points to the type of your HttpApplicationBase-derived type.

A sample MyHeliosApplication.cs file which combines steps (5) and (6) is provided below:

There is no official documentation as yet, but the Microsoft.AspNet.Loader.IIS NuGet package includes some limited Intellisense for these APIs. They roughly correspond to a slimmed-down version of the APIs on System.Web.HttpContext.