A system failure occurs when a component or service fails to perform its specified task at the appropriate time. Here we look at the following kinds of failure:
· Standard failures
· Basic failure classification
· Single points of failure (SPOFs)
The following factors leading to failure are common to all services:
Hardware includes central processing unit (CPU), memory, network interface card (NIC), and so on. The different kinds of service might reside on physically different hardware, so the failure of a single machine can affect one or more SAP service(s). This is a common cause of failure.
· Operating system services
SAP services depend in turn on operating system services. If operating system services fail, then so does the SAP service. An example of an operating system service is the socket layer services, the failure of which affects the SAP message service.
As with any software, programming errors in software applications can lead to failure of an SAP service.
The following graphic shows the categories for classifying failures:
Basic Failure Classification
When thinking about fault-tolerance, you can look at failure in the following ways:
Here we divide the system into layers with their associated components, using the categories shown in the above graphic.
This section discusses failure of the SAP system services in detail:
Ў How to detect failure
Ў The effects of failure
Ў How to recover from failure
The database, enqueue, and message services in a standard SAP system cannot be made redundant by configuring multiple instances of them on different host machines: this means that they are single points of failure (SPOFs). The remaining services (that is, dialog, update, background, gateway, and spool) can all be configured redundantly (in other words, on multiple host machines) to provide improved availability.
In a high availability SAP system, you can protect vulnerable services, such as the enqueue, message, and database services by using, for example, cluster environments with switchover solutions. For more information, see:
In an SAP installation, Network File System (NFS) (for UNIX-based application hosts) and shares (for Microsoft Windows-based applications hosts) are SPOFs. Some installations use an Internet Domain Name Service (DNS). DNS is also a single point of failure.
Finally, see Failure Recovery for more information on how SAP systems recover following failure:
· Automatic recovery of SAP processes
· Logon load balancing (prevents users logging on to a dialog host that has failed)
· HTTP load balancing with the SAP Web dispatcher