Like an automobile, a web application needs occasional maintenance and management over its life cycle. Although it doesn't need oil changes, it will probably need version upgrades. There may not be manufacturer recalls, but sometimes servers fail or hang. An application doesn't need to be washed and detailed, but it does need to be backed up. And both cars and applications need occasional performance tuning.

This article provides a complete list of the system management functions that need to be performed on a standard architecture web application, with a particular emphasis on doing so in an Infrastructure-as-a-Service environment.

1. EvaluationAnyone who has implemented an application without sufficient evaluation, only to realize too late that it does not solve the business problem, will understand why evaluation is part of the application lifecycle.

Evaluation is facilitated with two primary components: information about the application and a try-before-you-buy capability. Many questions about an application can be answered efficiently with basic feature and function information, and ideally a competitive comparison from several similar applications will give visibility to their strengths and weaknesses. But these are prerequisites rather than substitutes for actually trying and using the product. Ideally, a "test drive" will not require any setup or configuration, since the goal is only to determine whether it meets your needs. You want to spend your evaluation time using the software, not learning how to deploy and configure it.

2. DeploymentDeployment is the tip of the system management iceberg - it is the most visible procedure because you cannot even get started without it.

Automating a deployment has many benefits, even if it is superficially a one-time deployment, because the automation script provides documentation and a kind of checklist to ensure that configuration details are handled properly the next time. If the upgrade is performed by re-deploying to a new server entirely, (this is much easier with virtual machines and cloud servers), then the upgrade process is just a matter of re-running the automation.

Another benefit of automating deployments is that best practices are made repeatable and documented, thereby reducing the chance of human error.

3. BackupAs soon as you begin to use your application, you should begin backing up the data it stores in a location that is both physically and logically separate from the primary data store.

Ideally, a backup contains the minimum unique data necessary to reproduce the state of the system. This keeps the cost of transporting and storing the backups low, which in turn encourages a higher backup frequency. However, sometimes this minimization should be traded off against the amount of time required to restore the system to working order.

4. MonitoringApplications and servers fail or bog down unpredictably. Persistent automated monitoring, with appropriate forms of notification (email, text message) frees you from having to explicitly check on the status of the application, but still ensures that you hear about problems when they happen, rather than when they are reported by users hours later.

Importantly, applications must be monitored at the application level - by robotic access through the application itself. It is common for servers and virtual machines to seem perfectly fine while the application is unresponsive. Remember that users and customers do not care about "server uptime" - they just want to use the application or site.

Deeper monitoring can signal trends that suggest that an imminent failure before it happens. For example, by tracking memory utilization and number of web server processes, a monitoring system may be able to predict that a server is about to overload. This type of deeper monitoring can also be useful for automated scaling procedures.

If the application has this requirement, there must be an easy, flexible, and reliable method of scheduling and automatically performing these jobs. It is common to use cron or Windows Task Scheduler for these procedures, and as long as these tools are accessible this is a workable solution. Even better is an off-server job scheduling mechanism, so that the status of the server and application does not affect whether the job runs and whether failure notifications can be delivered.

6. UpgradesMost application software and its supporting technology stack are subject to occasional version upgrades and patches.

It is extremely convenient to be able to easily duplicate the entire application environment and perform the upgrade first on a copy. Running manual or automated tests to confirm that the upgrade worked can improve reliability. If the upgrade failed, because (for example) a step was left out or a configuration change conflicts with the new version, the duplicate environment can be used to check and repair these issues and the upgrade process repeated until it works properly. This best practice minimizes the downtime associated with the upgrade.

7. RecoveryMany environments assume that backups will only rarely be used, so accessing them is expensive and possibly time-consuming. In an IaaS environment, with the right tools, it can be relatively easy to retrieve and restore backups to either a production system or to a copy.

Obviously, when a server or application does fail, the first thing to try is to restore the operation of the application in place. The next thing to try is deploying a new application environment, then restoring a backup or turning a replication slave into the master. The former will result in a loss of data based on how long ago the backup was performed. The latter will typically result in only the very last transaction being lost. DNS entries must be updated.

Sometimes, a server failure is actually a consequence of an entire data center experiencing downtime. In this case, it becomes clear why the backups must be kept offsite. The attempt to deploy a new application will fail in the original data center, so it must be performed elsewhere.

Ideally, a management system will provide the optional ability to sequence and automate all these procedures in connection with the monitoring. This can minimize downtime and avoid the need to have staff on call 24x7.

8. ScalingThe cost of frequently changing resources to match load must be weighed against the cost of having excess resources for some time. Burst scaling is much less common and substantially more challenging to handle well.

In single server application deployments, scaling consists of redeploying the application on a server with more memory and/or compute resources. Multi-server deployments are scaled by adding or removing servers from a homogeneous horizontally scalable tier, usually a web tier and possibly a separate application server tier.

In addition to deploying fully configured web or application servers, they must be properly added to (or removed from) a load balancer queue, and this must be done in a way that does not affect active connections. Thus, whether these scale changes are initiated manually or dynamically in response to monitoring output, it is crucial that the deployment (or un-deployment) of resources be automated to avoid configuration errors and to ensure a transparent user experience on the production environment.

9. TuningSometimes application deployments can be tuned to perform better independent of resource scaling. Typically this involves changing configuration parameters and restarting the web server or rebooting the server.

If system management for the application is largely automated, any manual changes need to be reflected in the automated deployment procedures to ensure that they are reflected in later re-deployments (including restoring backups, deploy from scratch upgrades, and the like). A very sophisticated management system might actually perform tuning automatically based on load and performance characteristics of the application. However, this is unusual because it is typically very application-specific.

10. Utility ManagementMany application deployments include utility software that provides, for example, security, log analysis, caching, or email delivery. These utilities are often more challenging to install even than the technology stack or the application itself, and configuring them to connect to the application is almost always tricky. Consequently, a compatibility matrix along with automated deployment procedures to allow independent installation of each utility is an enormous time-saver. Automated removal of these utilities is also crucial, as it can be even more difficult than installation.

ConclusionWe have seen that there are numerous system management activities to be performed in a typical web application deployment. Accomplishing these tasks manually is relatively burdensome and requires a fair amount of skill. In the Infrastructure-as-a-Service world, most of these procedures can be automated or automated with manual initiation; and, further, they can be performed in ways that are more reliable and testable than in a bare-iron data center. With an appropriate IT Process Automation system, a single-tenant application deployment in the cloud can be almost as easy as Software-as-a-Service, but without the attendant loss of control and flexibility.

About Dave JilkDave Jilk has an extensive business and technical background in both the software industry and the Internet. He currently serves as CEO of Standing Cloud, Inc., a Boulder-based provider of cloud-based application management solutions that he cofounded in 2009.

Dave is a serial software entrepreneur who also founded Wideforce Systems, a service similar to and pre-dating Amazon Mechanical Turk; and eCortex, a University of Colorado licensee that builds neural network brain models for defense and intelligence research programs. He was also CEO of Xaffire, Inc., a developer of web application management software; an Associate Partner at SOFTBANK Venture Capital (now Mobius); and CEO of GO Software, Inc.

Dave earned a Bachelor of Science degree in Computer Science from the Massachusetts Institute of Technology.