Many computer-based services in our everyday life are implemented by distributed systems. The failure of these systems, especially in the area of electronic banking, e-commerce, health-care, and process control, may lead to loss of profit or even serious property damage. Some of these systems are implemented by using object-oriented (OO) software technology. The need for dependable OO distributed systems has lead to the development of middlewares (e.g. the Fault Tolerant CORBA of the Object Management Group) and commercial components that offer robust support for applications. These middlewares provide well-defined interfaces and mechanisms to be able to deploy and manage redundant server objects in order to avoid single points of failure in the application. However, the complexity of the standards and the lack of mechanized dependability and performability analysis are obstacles to the widespread use of these middlewares. The designer needs support to construct optimal application-controlled fault tolerance strategies and to select the parameters of the fault tolerant infrastructure (e.g. the number of replica objects and redundant servers, the replication style).
The project aims at the development of algorithms and techniques that support the mechanized dependability and performability analysis of robust object-oriented distributed systems. These analyses allow the comparison of alternative solutions, the estimation of the effects of selected parameters and the identification of dependability bottlenecks.
The project plan consists of the following tasks:

Construction of the dependability model of distributed OO applications that are designed according to the FT-CORBA specification. The dependability model is elaborated in the form of stochastic Petri-nets or Stochastic Reward Nets. The technique is demonstrated on pilot applications.

Overview and comparative analysis of the various behavioral equivalences that are defined in the field of automata theory. These equivalences will be used to compare the services (observable behavior) provided by the redundant server object group in the presence of faults and the service (observable behavior) expected by the client. In this way the effects of faults are examined and fault coverage is proved.

Construction of performance models that can be used to analyze both synchronous and asynchronous communication in distributed OO systems. Our target formalism is the layered queuing network. Relationship of the performance in the case of fault-free behavior and in the case of anticipated faults is examined, in this way performability measures are derived.

Elaboration of profiling and measurement techniques in order to obtain the parameters of the performability model. Profiling in the distributed environment is supported by a monitoring technique based on assigned signatures.