16 Enterprise Manager Diagnosability

This chapter introduces diagnostic capabilities in Enterprise Manager that extend to Oracle Management Service (OMS) and Management Agents.

Fault Diagnostics Framework

Enterprise Manager includes a fault diagnostics framework for collecting and managing diagnostic data. Diagnostic data includes trace files, dumps, and core files as well as other information that enables customers and Oracle Support to identify, investigate, track, and resolve problems quickly and effectively.

Administrators use the Incident Packaging System (IPS) to package incident data and diagnostic results in a zip file for upload to My Oracle Support (MOS)

Administrators receive confirmation that an SR was created with which to track problem resolution

Fault Diagnosability Infrastructure

For critical errors, the ability to capture error information at first-failure greatly increases the chance of a quick problem resolution and reduced downtime. An always-on, memory-based tracing system proactively collects diagnostic data from many Enterprise Manager components, and can help isolate root causes of problems. The system of data collection is similar to that of airplane "black box" flight recorders. When a problem is detected, alerts are generated and the fault diagnosability infrastructure is activated to capture and store diagnostic data.

The fault diagnosability infrastructure aids in preventing, detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors such as those caused by code bugs, metadata corruption, and customer data corruption.

When a critical error occurs, it is assigned an incident number, and diagnostic data for the error (such as trace files) are immediately captured and tagged with this number. The data is then stored in the Automatic Diagnostic Repository (ADR), where it can later be retrieved by incident number and analyzed.

Incident Manager

The Incident Manager provides a central point of control for managing events, incidents and problems detected within Enterprise Manager.

The Incident Manager gives you in-context access to diagnostic and resolution capabilities. You also have in-context access to My Oracle Support, where you can research knowledge base articles and create service requests.

The Guided Resolution region offers recommendations and provides links to diagnostics and resolutions.

Support Workbench

The Enterprise Manager Support Workbench (Support Workbench) is a facility that enables you to investigate, report, and in some cases, repair problems (critical errors), all with an easy-to-use graphical interface. The Support Workbench provides a self-service means for you to gather first-failure diagnostic data, obtain a support request number, and upload diagnostic data to Oracle Support quickly and with a minimum of effort, thereby reducing time-to-resolution for problems.

The Support Workbench allows you to view and process the contents of ADRs. From the Home and Problem Details pages you can do the following:

View recent and historical problems

View and create diagnostic packages

Create user-reported problems

Review checker findings

Search MOS knowledge base

Perform Health Checks and Run Diagnostics Kit

Perform Health Checks and Run Diagnostics Kit

Health checks test the viability of various system components. Health checks run automatically in response to an incident. You also can perform targeted checks proactively. The diagnostic framework includes a comprehensive set of 26 out-of-box health checks to test components such as Jobs, Credential, Event, Loader, Plugin, ASLM, and so forth. Health check results are stored in the ADR.

The Enterprise Manager Diagnostics Kit is a set of Oracle-supplied scripts specifically designed to identify inconsistencies in Enterprise Manager that are known to contribute to errors. In some cases, the script may be able to resolve the issue.

The scripts run repository diagnostics against system modules. You can run diagnostics against all or selected modules. The kit is accessible via a link in the Support Workbench. Diagnostic output is stored in the ADR with other dump files.

Incident Packaging Service (IPS)

The IPS enables you to automatically and easily gather the diagnostic data (traces, dumps, health check reports, and so forth) pertaining to a critical error and package the data into a zip file for transmission to Oracle Support.

Because all diagnostic data and files related to a critical error are tagged with that error's incident number, you do not have to search through all the stored information to determine the files required for analysis. The IPS identifies the required files automatically and adds them to the zip file.

Before creating the zip file, the IPS first collects diagnostic data into an intermediate logical structure called an incident package (package) and stores it in the ADR, where you can view the package and modify its contents. For example, you may want to add additional diagnostic data or remove existing data before uploading the zip file to Oracle Support.

Draft Service Request Acknowledgment

After you upload the zip file, you receive a confirmation that the file was successfully generated and that a Draft Service Request has been created. You are advised to go to My Oracle Support to finalize and submit the SR to Oracle Support.

Using the Support Workbench to Investigate Problems

When Enterprise Manager encounters a critical error that prevents you from completing a task, Enterprise Manager logs an error and generates an incident for this critical error, which then generates an alert. Enterprise Manager stores incident details, including dump and trace files where applicable, in the Automatic Diagnostic Repository so that Support Workbench can access this information and display it.

After receiving an alert notifying you of a problem or incident, take the following action:

From the Enterprise menu, select Monitoring, then select Support Workbench.

In the list of targets that support ADR, locate the target about which you were notified and click the target link.

On the Support Workbench page, perform any of the following actions as appropriate:

View problem or incident details

View, create, or modify incident packages

View health checker findings

Close resolved problems

For details on performing these actions, see the Cloud Control online help.