Ras; Reliability; Section 2.4, "Ras - Fujitsu Sun Oracle SPARC Enterprise M3000 Overview Manual

Hide thumbs Also See for Sun Oracle SPARC Enterprise M3000:
Table of Contents
2.4

RAS

RAS means the functions related to Reliability, Availability, and Serviceability.
The RAS function minimizes the system downtime by providing error checking at
adequate locations, and centralized monitoring and control of error checking. It
further correctly determines faulty locations and enables replacement of faulty
components during operation to minimize the system downtime.

Reliability

Availability
Serviceability
2.4.1
Reliability
Reliability represents the length of time the server can operate normally without
failure.
Reliability is equally important to both hardware and software.
To improve quality, adequate components must be selected with consideration given
to the product service life and the required response in case of a failure. In
evaluations such as stress tests that check the service life, components and products
are inspected to determine whether they meet the target reliability levels.
Furthermore, software could have problems attributable not only to programming
errors by also to hardware faults. These factors need to be taken into consideration
to improve the reliability of the entire system.
The M3000 server provides the following functions to implement high reliability:
Periodic software diagnosis (host watchdog monitoring)
Cooperates with XSCF firmware to periodically check whether the software
including the Oracle Solaris OS is running in the domain.
Periodic memory patrol
Periodically performs memory patrol to detect memory software errors and stuck-
at faults, even in memory areas not normally used. Memory patrol prevents
faulty memory areas from being used by the OS or the application software and
thereby prevents the occurrence of system failures.
Status checking of components
Keeps checking the status of each component to detect signs of an imminent fault,
such as system down occurrences, and thereby prevents system failures.
2-4
SPARC Enterprise M3000 Server Overview Guide • March 2012
Table of Contents
loading

Table of Contents