General Checklist

Purpose

This checklist covers questions from Important Questions About Your Setup, with cross-references to other sections in this documentation. A major problem is to define reasonable thresholds to “trigger” specific recommendations because there are so many complex and inter-related dependencies to consider. Therefore, many of the formulations are general and need to be adapted to your particular installation.

Risk analysis with checklists

Before you start to build high availability into the systems at your site, SAP strongly advises you to try and quantify which of the items listed in this general checklist and in the single point of failure (SPOF) checklist are relevant for you. Having ranked the vulnerable aspects of your system according to the costs of failure (for example, whether you need to call in an external engineer, order a replacement part, and so on), you are then in the best position to start improving availability.

Process Flow

...

       1.      You consider how much system uptime your business requires:

Ў        5 days a week / 12 hours a day

§         Sufficient offline time to do system maintenance and offline database backups during operational days

§         Database size might require online database backups during operational days or partial offline backups

§         Upgrade (AS-ABAP), system maintenance, offline database backups can be done during non-operational days (for example, at weekends)

For information on database backups, see Backup with Oracle, Archive and Backup with Informix, Backup with MaxDB, Archive and Backup with DB2 UDB for UNIX and Windows, Backup with DB2 UDB for z/OS, or Backup with DB2 UDB for iSeries.

Ў        5 days a week / 24 hours a day

§         Online database backups are required during operational days

§         Upgrade (AS-ABAP) and general system maintenance have to be done during non-operational days

§         Offline database backup has to be done during off-days

Ў        7 days a week / 12 hours a day

§         Sufficient non-operational time available during each day to do system maintenance and offline database backups

§         Database size might require online database backups

§         Scheduling SAP Upgrade (AS-ABAP) might become an issue

Ў        7 days a week / 24 hours a day

§         Special time slots have to be defined to do upgrades, both Upgrade (AS-ABAP) and database software upgrades. For more information about database upgrades, see Upgrade with Oracle, Upgrade with Informix, Upgrade with MaxDB, Upgrade with DB2 UDB for UNIX and Windows, or Upgrade with DB2 UDB for z/OS.

§         Database backups have to be online. Refer to Backup with Oracle, Archive and Backup with Informix, Backup with MaxDB, Archive and Backup with DB2 UDB for UNIX and Windows, Backup with DB2 UDB for z/OS, or Backup with DB2 UDB for iSeries.

§         Redundant hardware components are worth considering (see Disk Technology, Switchover Software, and Uninterruptible Power Supply (UPS).

§         Disaster Recovery is also worth considering

       2.      You consider how much system downtime your business can tolerate until there is only a minor business effect:

Ў        Several hours or one business day

§         Probably no special measures necessary for protection against hardware or operating system failure

§         Standard recovery procedures for databases are most likely sufficient

Ў        Less than above

§         Redundant hardware components might become necessary, such as Disk Technology, Network High Availability, Switchover Software, and Uninterruptible Power Supply (UPS).

§         Database backup frequency, restore and recovery times have to be evaluated.

If restore takes too long, backup devices need to be replaced with faster ones or more devices added. Alternatively, you might need to increase the backup frequency. See Recovery with Oracle, Recovery with Informix, Recovery with MaxDB, Recovery with DB2 UDB for UNIX and Windows, Recovery with DB2 UDB for z/OS or Recovery with DB2 UDB for iSeries.

If data volume is simply too large to finish restore/recovery in an acceptable period, you need to evaluate your Disk Technology and employ, for example, mirrored disks to avoid restore and recovery altogether (except in the event of multiple simultaneous failure).

       3.      You consider how much system downtime your business can tolerate until there is a major business effect (for example, loss of business):

If your business can only tolerate a few hours downtime:

Ў        You must make sure that a restore/recovery can be completed in the time available. Refer to Recovery with Oracle, Recovery with Informix, Recovery with MaxDB, Recovery with DB2 UDB for UNIX and Windows, Recovery with DB2 UDB for z/OS, or Backup with DB2 UDB for iSeries.

If this cannot be guaranteed, your disk technology has to be looked at (see next point).

Ў        Redundancy for hardware components becomes important, so evaluate the use of:

§         Special Disk Technology (for example, consider disk mirroring, RAID, or LVM). Disks are generally the most vulnerable of all hardware components so it makes sense to start with them.

§         Redundant network components. Refer to Network High Availability.

§         Cluster CPUs with switchover solutions to protect the database server and/or the central application server. Refer to Switchover Software for High Availability.

§         Uninterruptible Power Supply (UPS) is cheap and worth considering.

Ў        If your database is Oracle or DB2 UDB for z/OS, an alternative to switchover software is to use Replicated Database Servers together with the DB Reconnect feature.

Ў        More than one node should be available as application server. Also, at least two nodes should be prepared to act as central application server after a manual reconfiguration. This means that, if the node where the central application server is running becomes unavailable, another node should be prepared to start the central application server.

       4.      You consider how much system downtime your business can tolerate until it faces collapse:

Ў        Evaluate your system for single points of failure. Even if 90% of the system is equipped for high availability (for example, mirrored disks, redundant network components, and so on), this is worthless if one of the components in the unprotected remaining 10% fails.

Ў        If the time expected to replace critical hardware components or reconfigure critical software components is more than the period that would put your entire business at risk, you should seriously consider Disaster Recovery using a backup site.

       5.      You consider whether your business has periods with special availability requirements.

If there are critical periods of an application that cannot be interrupted at all or where an interruption can be tolerated for less than a couple of minutes only, you might need to take extra precautions. Consider the following:

Ў        Disk Technology (for example, consider disk mirroring,  RAID or LVM)

Ў        Network components

Ў        Switchover Software or the use of Replicated Database Servers together with the DB Reconnect feature (only available for certain databases)

Ў        Uninterruptible Power Supply (UPS) or redundant power suppliers

       6.      You consider factors concerning your installation:

Ў        Age of system

The approach you need to take depends on whether a new hardware system is being installed with the SAP system or whether it is to be installed with existing hardware:

§         If a new system is being set up, you should evaluate the high availability requirements at an early stage, and design the new system accordingly.

§         If the SAP system is being installed on an existing system, you need to investigate the system for weak points.

Ў        Evaluate installations options

You evaluate the installation options for application servers and the database server. Depending on the desired installation and high availability requirements, you need to carefully consider the mapping of SAP system services since this might not be a straightforward task.

Ў        Expected Data Volume

The database setup needs to be carefully planned:

§         Proper planning of database layout makes space management a lot easier. See Space Management with Oracle, Space Management with Informix, Space Management with MaxDB, Space Management with DB2 UDB for UNIX and Windows and Space Management with DB2 UDB for z/OS.

§         Large databases (> 100 GB) might already cause problems when it comes to backup. The time spent for restore and recovery is probably too long. You should evaluate your disk technology since mirrored disks give additional options for backups (see RAID and LVM). Standard backup, restore, and recovery procedures might take too much time.

For more information about backup and recovery of databases, see Database High Availability.

Ў        Expected Transaction Load

The transaction load influences the installation options for application servers and the database servers.

       7.      You consider your internal resources.

Ў        Budget available to finance improvements

SAP would normally expect that improvements are undertaken in the following order, as finances permit:

§         Disk Technology

§         Uninterruptible Power Supply (UPS)

§         Server Network

§         Switchover Software for database and central application hosts

§         Access Network

Ў        Availability of qualified personnel

The level of qualified personnel available to monitor the system during “normal operation” hours might influence the level of redundancy you choose:

§         Qualified personnel not always available

Certain technologies, such as disk technology, switchover solutions, and redundant network components, reduce the need to have personnel available to handle errors.

§         Qualified personnel always available

You can probably rely on your personnel to handle errors.

Be sure to have the appropriate processes in place to get your staff involved in the event of an error (for example, hotline, on-call, standby).

       8.      You consider your external resources.

Ў        Support contracts

The level of support contracts in place might influence your approach to high availability:

§         “Special” maintenance contracts in place

If you have such contracts with hardware and software vendors, for example, a guaranteed replacement of faulty hardware components within 24 hours, you might choose not to have a disaster recovery site or to implement less comprehensive redundant hardware, such as disk technology, networks, and so on.

§         “Standard” maintenance contracts in place

If your maintenance contracts are only “standard”, you might choose to have a higher level of availability to cover gaps in maintenance. Then you might choose to set up a disaster recovery site and to implement more comprehensive redundant hardware, such as disk technology, networks, and so on.

Ў        Access to your system for remote support and maintenance

Before implementation, consider using the GoingLive service. For proactive and highly qualified SAP system administration, you can consider using the EarlyWatch service to avoid problems arising. SAP provides both services.

For more information on GoingLive and EarlyWatch, see SAP Safeguarding.

       9.      You consider environmental or other factors.

Examples of these factors are:

Ў        Unstable power supply in your area

Consider using Uninterruptible Power Supply (UPS) or redundant power suppliers.

Ў        Likelihood of disaster such as earthquake

Consider using a disaster recovery site.

Ў        High temperature

Consider using a reliable air-conditioning facility.

Ў        Switch from summer to winter time

Consider using the DST safe kernel