4 Subsystems responsible for Application stability (Don’t blame it on the Network)

As an Application Support Engineer or a Lead Developer, you are supposed to save the day when things hit the fan. But this is easier said than done. I’ve dealt with these issues on a daily basis for couple of decades. In any enterprise application, there are 4 major subsystems that cause performance degradation or outage. Even though the Application platforms have drastically changed in the past few years, these 4 subsystems still make up majority (if not all) of the issues. Understanding this obvious, yet often overlooked fact, you will be able to ask intelligent questions during troubleshooting.

Backend Database or other Remote service(s) that your application depends on

The Operating System that hosts the Application

Application Code

And no, Network is NOT one of the major subsystems to point finger at. That is not to say that Network issues do not occur at all. But in practice, I would look at Network when all other suspects have been eliminated.

Also note that the 4 subsystems listed are not in any order. That is, they all have equal chances of being the culprit.

Image: 4 Subsystems responsible for Application stability

(C) Karun Subramanian

Application Serving Platform

What is it ?

This is the runtime environment where your application executes. In Java world, it is typically a JEE Application Server such as WebSphere, WebLogic, JBoss or Tomcat. It also includes components like Web Servers (Apache, IIS etc), Caching proxys and the JVM.

What can go wrong with it ?

A lot! Application Servers have grown so complex that corporations have entire teams dedicated for maintaining these Middleware components (Note: Application Servers and associated server components are often called Middleware). The issues arising out of Middleware include the following:

Backend Database or Remote Service(s)

What is it ?

Your application will almost always depend on some sort of a backend system – a Database or a remote Web Service provider. The connectivity between your application and the Backend plays a crucial role in the stability of your application. Depending upon the remote system, there can be slew of configuration you may have to do on both the sides. In order to troubleshoot this subsystem, you may have to involve DBAs, Messaging Administrators and remote system’s engineers.

What can go wrong with it ?

The primary issue you may face is the availability of the remote system. If the remote web service provider is down, your application can not only break some functionality, it can completely die if the failed calls consume all the system resources. The issues in this subsystem include the following:

Availability of the remote system

Incorrect configuration on either side (Example: MaxConnections configuration of your JDBC connection pool)

A change in the remote system that breaks your application

Unexpected SSL certificate expiry

OS (Operating System)

What is it ?

Your application must run on a host, whether physical, virtual or in the cloud. The OS running on the host plays a very important role in the stability of your application. The most popular OS-es for hosting applications are Linux and Windows. Few other flavors of Unix, such as Solaris, AIX and HP UX are also heavily used. To troubleshoot issues in this subsystem you will need Systems Administrators (hopefully experienced ones) and possibly Storage Administrators.

What can go wrong with it ?

More than you might think! For starters, the host can run out of resources (CPU, Memory and Disk) which will put direct and immediate stress on your application. What about OS patches (especially the ones that are installed without informing the application owners !) that break your application? To be fair, this is one of the subsystems that has gotten lot better in the recent years due to the rise of Cloud. Here the major issues in OS layer that can cause instability of your application

OS running out of system resources

OS getting rebooted unexpectedly

OS libraries not compatible with the application

Issues arising from other applications that run on the same host

Application Code

What is it ?

This is the bread and butter of your application. Developers write code, typically as part of a team and work with Middleware Administrators (or Release Engineers) to deploy the application in production environment.

What can go wrong with it ?

One of the major themes in this subsystem is this: “Application works fine in Developer’s local environment, but it does not work in production. Therefore it is NOT a code issue”. Well, may be. A through analysis of the issue is required to determine one way or the other. The major issues in this subsystems are:

Code is leaking memory

Code does NOT close remote connections properly and hence flood the application and OS with network connections

Code encounters an infinite loop and stalls the application

Code is logging too much/too little.

There you have it. The 4 subsystems that can cause havoc in your application are Application Serving Environment, Backend Systems, OS and Application code. They all have equal changes of being the root cause of instability. While varied expertise is required to investigate issues at these subsystems, an Application support engineer has a unique opportunity to analyze all the 4 subsystems at a high level and arrive at a reasonable root cause.

In the next few blog posts, I’m going to be talking to you about one of the most dreaded issues in Java. Java Memory. Can’t wait to show you.