Abstract

Computerized services are the driving force behind every day business for many companies, it is of the utmost importance that these services are available during business hours because downtime costs serious money. Most of the computerized services today are based on a distributed architecture because of the many benefits of such an architecture. There is a downside to distributed architectures though; distributed architectures have an incomplete observability problem resulting in tough decision making and difficult control of the system build according to the architecture. This paper describes a design of a business continuity monitoring model, developed to cope with software, hardware, and operator failures by reducing the time required to detect, diagnose, and repair a problem in a distributed architecture. It is based on a three-tier model combined with five monitoring domains distilled from a standard distributed architecture. A prototype was developed to test the model in a real environment.