Predictive Self-Healing

Sun has developed a new architecture for building and deploying systems and services designed for
Predictive Self-Healing. Self-healing technology enables Sun systems and services to maximize availability in the face of software and hardware faults and facilitates a simpler and more effective end-to-end experience for system administrators, which can reduce cost of ownership. The first self-healing features are available as part of the
Solaris 10 OS.

The first set of technologies includes Solaris components that implement predictive self-healing for CPU, memory, and I/O bus nexus components. The architecture is used to facilitate a simplified administration model wherein traditional error messages intended for humans are replaced by binary telemetry events consumed by software components that automatically diagnose the underlying fault or defect. The results of the automated diagnosis are used to initiate self-healing activities such as administrator messaging, isolation or deactivation of faulty components, and guided repair.

The second set of technologies is delivered as a set of Solaris components that implement the Solaris Service Manager, which makes software services on the system participate in the Predictive Self-Healing architecture. The Service Manager provides a consistent administration model for long-running software services on a Solaris system. Hardware faults that affect software services, as well as software failures, will cause the affected services to be restarted automatically, along with any services that declared a need to be restarted when the directly impacted services are restarted.

When appropriate, a self-healing system may direct an administrator to a knowledge article to learn more about a problem impact or repair procedure. You can access the knowledge article corresponding to a self-healing message by taking the Sun Message Identifier (SUNW-MSG-ID) and appending it to the link
http://www.sun.com/msg/ in your web browser. By the time you read the article, other agents participating in the self-healing system may have already offlined an affected component and taken other action to keep your system and services available.

The Predictive Self-Healing model is scalable, extensible, and portable, and will be used across Sun's product line to deliver a common experience for service and administration, reducing cost of ownership for Sun systems. You can learn more about self-healing technology using the following links:

How to Create a Service Management Facility (login required)
This Service Management Facility (SMF) How-To Guide explains what a basic SMF service manifest is and shows how to develop a service manifest for a given application. The service manifest for the PostgreSQL database included in Solaris 10 is the example in this guide.

How-To Guide on Service Management Facility -
(August 2007)
This How-to guide on sun.com instructs system administrators unfamiliar with the Solaris 10 OS on how to use the Service Management Facility (SMF) to monitor and manage a Solaris 10 system.