SPOF Checklist

Purpose

There are a number of single points of failure (SPOFs) in most systems and you should be aware of these before you start to build high availability into your system. However, what constitutes a SPOF depends on your particular system configuration. For example, a disk drive might be a SPOF in a given system configuration but, when mirrored, no longer be a SPOF. The major SPOFs are listed below, grouped into main system areas.

See also System Failure (SAP NetWeaver AS).

Prerequisites

SAP suggests that, for each component of a planned or installed SAP system listed in this process, you assess the following:

·        Is the component a SPOF in your particular system configuration?

·        Can you afford the risk of failure for a particular SPOF?

Risk analysis using checklists

Before you start to build high availability into the systems at your site, SAP strongly advises you to try and quantify which of the items listed in this SPOF checklist and in the general checklist are relevant for you. Having ranked the vulnerable aspects of your system according to the costs of failure (for example, whether you need to call in an external engineer, order a replacement part, and so on), you are then in the best position to start improving availability.

Process Flow

...

       1.      You consider redundant configuration of the SAP services dialog, update, batch, gateway, and spool – that is, on multiple host machines – to improve availability. This means that these services are not single points of failure.

You can improve the availability of the message service by the use of switchover software.

For more information, see SAP NetWeaver AS ABAP: High Availability and SAP NetWeaver AS Java: High Availability.

       2.      You consider configuration of the database service to overcome its single points of failure:

Ў        Loss of connection between application service and database service. Use DB Reconnect to overcome this problem.

Ў        Loss of database data. For more information about this problem, see Replicated Databases. This is also discussed for each database manufacturer below.

       3.      You consider the database-specific recommendations in the following table:

Database

Single Points of Failure (SPOFs)

Oracle

·         Database Instance

Ў        Database background processes (DBWR, LGWR, SMON, PMON...)

Ў        Memory structures (SGA, semaphores)

You can protect the database instance using Switchover Software or Replicated Database Servers (only available for certain databases).

·        Database files

Ў        Control file

Ў        Current online redo log file

Ў        Data files

You can protect the control file and the current online redo log file by using Oracle or proprietary disk mirroring. You can protect the data files by using disk mirroring. You should also protect all files by doing backups.

You can use Oracle Standby Databases for a more comprehensive high availability solution that can withstand a disaster at one site.

Informix

·        Database instance

You can protect the database instance using Switchover Software.

·        Database data

You can protect all relevant files by using Informix or proprietary disk mirroring. SAP strongly recommends some form of mirroring (preferably Informix) for, at the very least, the “critical” dbspaces (logdbs, physdbs and rootdbs). In any case, you should also perform regular archives and backups.

See also Informix High-Availability Data Replication (HDR).

MaxDB

·        Database instance

You can protect the database instance using Switchover Software.

·        Database data

You can protect all relevant devices by using proprietary disk mirroring (RAID 1 preferred) for all data volumes and log volumes. If you want to use log volumes without RAID mirroring consider the possibility of mirroring using log mode. In any case, you should also perform regular backups.

You can use the MaxDB Standby Database or MaxDB Hot Standby for a more comprehensive high availability solution that can withstand a disaster at one site.

IBM DB2 Universal Database for UNIX and Windows

·        Database instance

You can protect the database instance using Switchover Software.

·        Database data

You should always perform regular backups.

See also Replicated Standby Database for DB2 UDB for UNIX and Windows.

IBM DB2 Universal Database for z/OS

·        Database instance

You can protect the database instance using Data Sharing for DB2 UDB for z/OS.

·        Database data

You can protect the data by performing regular backups and using disk mirroring. You can also use a standby database to protect the data against disaster.

See also Replicated Standby Database for DB2 UDB for z/OS and Data Sharing for DB2 UDB for z/OS.

IBM DB2 Universal Database for iSeries

·        Database instance

You can protect the database instance using Switchover Software.

·        Database data

You should always perform regular backups.

MS SQL Server

See the following high availability solutions:

·        Microsoft Cluster Server on Windows

·        Microsoft SQL Server Standby Database

·        Comprehensive Microsoft SQL Server High Availability Solution

       4.      You consider network-specific recommendations for:

Ў        Cabling

Ў        Active components (hubs, switches, routers)

Ў        Network Interface Card (NIC)

Ў        SAProuter

Ў        Network File System (NFS) – see System Failure (SAP NetWeaver AS)

       5.      You consider hardware and system software. For more information, see System Failure (SAP NetWeaver AS).

       6.      You consider disk technology.

Possible single points of failure in the hardware of a disk system include the following:

Ў        Power supply

Ў        Fan and cooling

Ў        Internal/external cabling

Ў        SCSI path from host machine to device

Ў        Internal system bus

Ў        Write-cache:

§         Non-volatile SIMMs or battery backup serve to address power failure

§         Mirrored SIMMs to address SIMM failure

Ў        Read-cache:  non-volatile SIMMs optional

Ў        Battery power for the device to store cache to disk in case of power failure

Ў        Controller

Ў        Micro code

Ў        Disk-internal storage processors

Ў        RAID internal storage maps

Ў        Disk spindles

Ў        Spindle mechanism

Possible single points of failure in the disk-based data are the following:

Ў        SAP user data

Ў        SAP system data

Ў        Software components:

§         The SAP system

§         DBMS and log files

§         Operating system and swap space

Ў        Root file system