Why Tracking and Monitoring All Exceptions is Important

Modern applications are complex multi-tier and multi-layer systems that consist of multiple client-side apps, web servers, application servers, and database servers. The applications use third-party libraries, communicate with other applications using different communication mechanisms, and rely on cloud services and hosting providers. The complexity goes on and on and on.

In other words, modern applications are Complicated with a capital C.

How can you manage that complexity? Your first thoughts probably go to application monitoring and performance tools. Yet that is not a complete solution. To manage application complexity fully, you also need exception monitoring. Without exception monitoring, you’ll never stay ahead of the complexity of modern applications.

Let me explain why…

Access to failure data

Lots of monitoring tools exist to help you identify and troubleshoot issues across all application layers. However, most of these tools are designed to show you the overall system state. Only logs, and exceptions in particular, offer insight into a finer level of detail.

By definition, an exception is an event which occurs during the execution of a program that disrupts the normal flow of the program’s instructions.

Exceptions are used to indicate many different types of error conditions across all the application layers: out of memory, stack overflow, IO problems, null pointer exceptions, network timeouts, and many others. They could indicate system problems, problems in application logic, operating system problems, connectivity problems, and so on.

When an exception occurs within a method, the method creates an object and hands it off to the runtime system. The exception object contains information about the error, including its type and the state of the program when the error occurred. Therefore, exceptions— handled or not—provide access to very useful raw data that can be used for:

Early error detection and prevention

Root cause analysis when problems occur

Reconstructing events after a problem occurred

Identifying security problems

Forensic evidence

Early problem detection

Many production problems build up gradually. Monitoring tools can often provide an indication that something is awry by monitoring metrics such as CPU load, RAM and storage access and network activity and connectivity, but that information alone can generally not be used to pinpoint the source of a problem. As a result, you run the risk of waiting until a problem becomes critical before you begin looking for the source.

Exception monitoring offers another better way of gaining early insight into production problems. By monitoring exceptions, you can trace a performance issue to its root case, thereby gaining the time and means to spot the problem before it reaches a tipping point and impacts your users or company.

Impact on application performance

Quite often, an app throws a lot of exceptions that might be harmless in terms of application functionality. Sometimes they are even referred to as “good” exceptions.

However, exception throwing requires stack trace propagation, which basically means that your app pauses to walk back the stack to collect information which will be used to create the stack trace. While this timeout in thread execution is insignificant as a singular event, it can create huge performance overhead on a large scale.

Therefore, “good” exceptions can have a major impact on an application’s performance, and shouldn’t be left unattended.

Exception Tracking and security

Also worth noting is how important exception monitoring is from a security perspective. Following are a couple of reasons why.

Hackers can use information exposed by error messages

Application failure often results in throwing an exception. Detailed error messages can provide attackers with useful information such as stack traces, privacy information, and sometimes even passwords, which can lead to security vulnerabilities such as enumeration, buffer attacks, sensitive information disclosure, etc. Even HTTP 404 can expose your server to attacks. While good code is supposed to not allow exceptions to go unhandled, in practice, it’s nearly impossible to anticipate all the cases when an exception could be thrown, and control its content. Furthermore, with all honesty, error handling is rarely robust enough to survive a penetration test. Therefore, actively monitoring your logs for this kind of exposure and taking action to fix it immediately can prevent hacker attacks and drastically improve your system’s security.

Identifying hacker attacks

In spite of all your efforts, if your system has been hacked, logs are often the only record that suspicious behavior is taking place, even if the system still works fine and all the monitoring tools show no problems. For this reason, monitoring exceptions may be your best shot at identifying when your system is being hacked.

The specifics of identifying such suspicious behavior depend on your application’s functionality, but generally you should be looking for any sign of abnormalities in your log files: new error messages and exceptions that could indicate elevation of access privileges, bulk downloads, privacy violations, etc.

Logs are also useful in reconstructing events after a problem has occurred, and as forensic evidence. Event reconstruction can allow a security administrator to determine the full extent of an intruder’s activities and expedite the recovery process.

Conclusion

Exception monitoring is one of the most important and powerful tools for identifying and preventing various types of potential and already existing problems. To make sure you utilize it to its full potential, make exception monitoring an integral part of an overall application monitoring effort in both development and production environments.

Don’t simply deploy infrastructure monitoring tools and call it a day. Instead, monitor exceptions throughout all the layers of your stack. Retrace’serror and log management solution provides developers with the information about everything that happened before, during and after an error was generated, providing you with a complete picture and making it easier to reproduce and resolve the issue.