How GWB Found Hidden Exceptions and Application Performance Problems

Exceptions in .NET can be very expensive when it comes to CPU cycles. Read how Geeks with Blogs (http://geekswithblogs.net) used Retrace to improve the performance of their application.

About Geeks with Blogs (GWB)

Thousands of software developers use Geeks with Blogs (GWB) to host their own personal blog. The code base has slowly morphed and has been modified by several developers over 15 years. As a free service, GWB decided to use Stackify to try improve the overall page load times and CPU usage with the hopes of also being able to use fewer servers to cut hosting costs and improve application performance and user experience. Here’s their story:

How GWB uses Stackify Retrace to improve its application performance

We installed Retrace on our web servers and configured log4net to send errors and logs to Stackify. This allowed us to get visibility to errors being logged by log4net, basic server monitoring, key .NET performance counters, and full code profiling details via Retrace. I have to admit, it was pretty amazing to see all of this data in one place.

By sending our log4net logs to Stackify we were able to immediately identify a couple simple bugs that needed to be fixed. Retrace’s page level performance tracking confirmed that two pages (the blog list and individual blog page view) were generating the vast majority of all the traffic to the site. We also found a couple places in our code where we could do a better job of caching database queries.

Out-of-the-box Stackify’s default monitoring included some .NET performance counters. One of which was the number of .NET CLR exceptions being thrown per minute. We noticed that the number was over 10,000 on each server but none of those errors were being logged via log4net. The exceptions were being caught and thrown away and we had no way to know what they were.

This shows total errors being about 400,000 to 700,000 every 30 minutes. Yikes!

Preventing and finding hidden exceptions in your code

It is very common for developers to make helper methods for common tasks like grabbing a column from a SqlDataReader and converting it to the correct type as shown below. This type of code seems innocent but can potentially cause huge performance problems as you will see below.

The root cause of our performance problems was actually #1, trying to access a column name that was not available in the SqlDataReader. We have several web pages that do different types of searches of blog posts and they use different stored procedures. A long time ago someone made the decision to have one code path that took the stored procedure results and populate them in to a common object that all the pages used. This worked perfect except all the stored procedures returned slightly different columns. This caused a lot of exceptions to occur for the missing columns, but we had no idea that the exceptions even existed since our code did not log the exceptions.

Stackify’s automatic Retrace exception collection

The only way we were able to find these exceptions was from Retrace. Because it does code profiling within the CLR, it is able to get visibility to all exceptions that are thrown. So even though our code was swallowing the exceptions and not logging them via log4net, Stackify was able to give us visibility to them with no coding or logging changes. We were able to immediately find problems in a 15 year old application.

GWB’s Optimization Results

With a few code changes we were able to eliminate several exceptions that we had no idea existed until we used Retrace. This cut CPU usage and enabled us to scale down the # of web servers we were using which will save us monthly hosting costs as well as to improve overall application performance.

This chart shows how the number of errors has dramatically lowered

Page load times improved as well. It is amazing how much of an effect hidden exceptions had on overall page load time across the app. Before the changes, average load times were 500-800 milliseconds and have dropped to 250-500ms depending on site traffic.

This chart below shows request volume by user satisfaction over the same time period. You can see that the yellow sluggish requests have lessened.

Exceptions are slow. Avoid them!

Exceptions in .NET are expensive, even when your users don’t know they are happening. The primary cause of it is capturing the stack trace at the time the exception is thrown. The .NET CLR has to stop and walk back through the calling frames and this requires CPU cycles and blocks the thread. At Stackify we would suggest never using exceptions to control application logic flow and to use good defensive coding techniques to avoid issues like those which Geeks with Blogs ran into.

Disclaimer: Geeks with Blogs is owned by Watson Technology Group, which is also an investor in Stackify

Matt Watson :Matt is the Founder & CEO of Stackify. He has been a developer/hacker for over 15 years and loves solving hard problems with code. While working in IT management he realized how much of his time was wasted trying to put out production fires without the right tools. He founded Stackify in 2012 to create an easy to use set of tools for developers.