System Failure (SAP NetWeaver AS)

Definition

A system failure occurs when a component or service fails to perform its specified task at the appropriate time. Here we look at the following kinds of failure:

·        Standard failures

·        Basic failure classification

·        Single points of failure (SPOFs)

Standard Failures

The following factors leading to failure are common to all services:

·        Hardware

Hardware includes central processing unit (CPU), memory, network interface card (NIC), and so on. The different kinds of service might reside on physically different hardware, so the failure of a single machine can affect one or more SAP service(s). This is a common cause of failure.

·        Operating system services

SAP services depend in turn on operating system services. If operating system services fail, then so does the SAP service. An example of an operating system service is the socket layer services, the failure of which affects the SAP message service. 

·        Software

As with any software, programming errors in software applications can lead to failure of an SAP service.

Basic Failure Classification

The following graphic shows the categories for classifying failures:

Basic Failure Classification

When thinking about fault-tolerance, you can look at failure in the following ways:

·        Component failure

Here we divide the system into layers with their associated components, using the categories shown in the above graphic.

·        Service failure

This section discusses failure of the SAP system services in detail:

Ў        How to detect failure

Ў        The effects of failure

Ў        How to recover from failure

Single Points of Failure

The database, enqueue, and message services in a standard SAP system cannot be made redundant by configuring multiple instances of them on different host machines: this means that they are single points of failure (SPOFs). The remaining services (that is, dialog, update, background, gateway, and spool) can all be configured redundantly (in other words, on multiple host machines) to provide improved availability.

In a high availability SAP system, you can protect vulnerable services, such as the enqueue, message, and database services by using, for example, cluster environments with switchover solutions. For more information, see:

·        Cluster Technology

·        Microsoft Cluster Server on Windows

·        Switchover Software for High Availability

·        Replicated Enqueue Server

In an SAP installation, Network File System (NFS) (for UNIX-based application hosts) and shares (for Microsoft Windows-based applications hosts) are SPOFs. Some installations use an Internet Domain Name Service (DNS). DNS is also a single point of failure.

Failure Recovery

Finally, see Failure Recovery for more information on how SAP systems recover following failure:

·        Automatic recovery of SAP processes

·        Logon load balancing (prevents users logging on to a dialog host that has failed)

·        HTTP load balancing with the SAP Web dispatcher