Disk Technology

Hard disks lie at the basis of data storage and conceptually represent a critical single point of failure in an SAP system. Disk backup schemes – including database backup, described in the sections on database backup and recovery in Database High Availability – only serve the very minimum of data availability requirements. Disk data in an SAP system must often be made significantly more available. This section gives you an overview of suitable disk technology and practical guidelines to help you improve the availability of your disks in an SAP system environment.

SAP advises you to consult your hardware vendor for more detailed information on specific solutions. This is because the area is complex and the information in this documentation, although written for the SAP system, is of a general nature.

The Need for High Availability of SAP System Data

You need to be clear about the minimum status of SAP system data availability that your business can tolerate after a disk hardware crash. The following helps you achieve this:

·        Minimum high availability requirements

If your high availability requirements are quite low, you might be able to afford the unplanned downtime taken to recover the SAP system database from backups following a disk failure. You then need to consider the following steps:

Ў        Replace the failed disk (is one immediately available?)

Ў        Restore backup data from storage media (often slow tape devices)

Ў        Recover the database to guarantee data integrity in your SAP system

The restore and recovery operations depend on the size of the database and the quality of backups available (for example, replaying transaction log data is more time-consuming than using more recent backups).

·        Maximum high availability requirements

Installations requiring a more rigorous level of availability can identify two differing levels of online data redundancy:

Ў        One level redundancy

Data remains online and fully available despite a single failure in a disk drive. This means that an extra copy of the data is constantly maintained for use in the event of a failure.

Ў        Two level redundancy

Data remains online and fully available despite two independent disk failures. This means that two extra copies of the data are constantly maintained.

There are particular problems with very large databases (for example, excessively long times to backup or restore the database) that can be alleviated by using two levels of disk redundancy. Backup is made easier and the chances of having to do a time-consuming restore are much reduced.

Use two levels of redundancy if downtime must be avoided at all costs

For highly sensitive applications you should consider migrating to two levels of redundancy for your disks (for example, three-way mirroring, see below).

Performance and Costs of High Availability Disk Systems

Few customers are willing to sacrifice system performance for high data availability. However, the requirement for high availability implies redundancy, which in turn implies that data must be checked and/or written multiple times to disk. To maintain performance in a high availability environment, you often need to consider improving the underlying hardware in terms of:

·        Speed of data handling

·        Intelligent disk controlling

·        Increased disk caches

·        Data bus throughput

·        Total disk capacity

·        CPU computing power

Consider high availability for disk systems seriously, despite costs

The costs involved in upgrading hardware often pay back very well. Remember that, if the system is not available, you incur costs due to loss of income, support and service required.

Data Availability in a Failure Scenario

The crucial discriminator of quality for high availability purposes is what happens in the event of failure. How does the disk system in a high availability environment react (in terms of software and hardware), if the redundant part of the data must be accessed to continue processing?

It is almost inevitable that online performance is degraded in case of failure because a heavy workload is imposed on either the disk hardware itself and/or the host machine CPU due to the following points:

·        Data must be re-constructed from check information

·        On-disk redundancy of the data must be re-created (mirror rebuild)

·        A failed disk must ultimately be replaced and then recommissioned.

Depending on the kind of disk technology used, a period of decreased performance must be expected. Choosing an unsuitable disk device for high availability purposes can hinder you from accessing the data in case of failure, simply due to unacceptably impaired performance.

Note that it is a common feature of disk systems with one level of redundancy that the risk of data loss is increased as soon as one failure has occurred. This is mainly for the following reasons:

·        Active drive redundancy is lost

·        Recovery work puts additional heavy work load on the system

·        Complex situations might arise in hardware or software

Mean Time Between Failures

A key characteristic of a hard disk is its “Mean Time Between Failures” (MTBF) given in hours (as an approximate rule, 100,000 hours = 11 years). This is the mean statistical average of the time a disk device operates until it fails. For example, assume a disk device with an MTBF of 50 years. This implies that a company with 25 disks of this brand suffer roughly one disk failure every two years. Statistically there is absolutely no guarantee whatsoever, that one specified disk does not crash within the first week of its life. Statistics only tell you that this event has a comparatively low probability.

Note that there might be a significant difference between the MTBF of one disk spindle itself and the MTBF of a disk system taken as an entity. A disk device contains a lot of components that might all fail. For example, many disk devices contain several disk spindles (JBOD and RAID systems). In this case the total effective MTBF of the device goes down roughly in proportion with the number of spindles in the array.

Specified values of MTBF for a given device are often calculated from the statistics of pure hardware crashes. In reality however, data unavailability can arise from more complex problems that are not included in these figures. Therefore the MTBF can only serve as a guideline to compare pure hardware resilience.

How to Achieve High Availability

Since every sub-component of a disk system can fail, how can you achieve a maximum of system availability? In general, high availability is achieved by hardware and data redundancy (software is considered as data in this sense).

As a guide to hardware aspects, consider the following:

·        Use hardware with high MTBF (“Mean Time Between Failures”)

·        Avoid single points of failure

·        Use fault-tolerant hardware and/or redundant components

There are two basic methods of redundant data storage:

·        Mirroring, where identical data are physically stored twice (or multiple times)

·        Check bits, where check information is computed from input data and stored to retrieve the full set of data in case part of it is lost

The disk-based components listed below must be considered as critical to the availability of your SAP system. Depending on whether the component is read or write intensive, you need to select a suitable method for protecting the data (see discussion below).

·        SAP system user data (normally write-intensive but depends on application)

·        SAP system data  (normally write-intensive but depends on application)

·        Software components:

Ў        SAP system (normally read-intensive)

Ў        Database (read-intensive)

Ў        Database log files (very write-intensive)

Ў        Operating system (read-intensive)

Ў        Operating system swap space (write-intensive)

·        Root file system (normally read-intensive)

The pros and cons of the different mechanisms of redundant data storage and the hardware aspects are discussed in detail below.

General Options for High Availability Disk Technology

There are two principal technologies for storing data redundantly in high availability disk systems:

·        Hardware-driven redundancy, that is, RAID  

This comprises the whole range of RAID technology (“Redundant Array of Inexpensive / Independent Disks”):

Ў        Hardware-emulated mirroring on RAID disks (RAID level 1, see diagram below)

Ў        Error-correcting technology (check bit computing) on RAID systems (levels > 1)

An intelligent disk (that is, controller, storage processors, and microcode) acts independently from the host machine CPU to make sure that data can be retrieved from the device even if one of the disk spindles fails.

·        Software-driven redundancy, that is, LVM

This is the regime of add-on software for the operating system, namely “Logical Volume Manager” software (LVM). CPU cycles are stolen from the host machine to run LVM software, so ensuring that data is written to physically redundant (mirrored) locations on disk(s). Underlying disk technology can be either several physically independent (standalone) disks or an array of disks sharing common hardware resources (disk tower, rack or cabinet, JBODs) but with no intelligent RAID controller installed on the device.

Hardware-Driven Redundancy with RAID

RAID stands for “Redundant Array of Inexpensive/Independent Disks”. While appearing logically to the operating system as a single disk drive (that is, user-transparent), a RAID device internally consists of an array of disk spindles with one single intelligent disk controller. The controller and its software (the microcode) handle the data distribution onto the disk array, the redundant storage of data and the computation of check bits inside the array. Often RAID arrays contain optional storage processors (CPUs) to assist in performing data handling inside the array.

Garth Gibson and Randy Katz set out this classification of RAID “levels” in the SIGMOD paper of 1988. They defined RAID levels to distinguish different methods for creating check information and distributing the data (and check bits) across the disk spindles. Note that level numbers do not imply any rating of redundancy quality or performance whatsoever. An overview of the RAID levels most relevant for the SAP system is given in the table below:

RAID Levels Relevant for the SAP System

Systems with RAID levels 3 and 4 have also become commercially available in recent years. However, they normally do not suit OLTP (online transaction processing) application software like the SAP system. In addition to the initial RAID definitions, further vendor-specific levels have been introduced. Several hardware vendors supply devices with configurable RAID level, so adding a further degree of flexibility to the configuration of the disk system.

Performance of RAID Systems

The general benefit of RAID systems (that is, hardware-driven redundancy) is that data storage is effectively handled solely by the disk hardware. There is no additional overhead on the host machine CPU since optimization of data storage occurs independently in the array.

The performance of disk devices in general depends on the following factors, several of which are particularly relevant to RAID:

·        Ratio of read-to-write access

·        I/O size

·        Access pattern  (random or sequential)

·        Parallelism in data bus transfer and disk spindle access

·        Possible bottlenecks

·        Caching

Read-to-write Ratio

Write performance is a critical point in any disk system that tries to hold information redundantly. This is simply due to the time overhead implied by:

·        Computing the check bit from the data

·        Storing the additional check bit

·        Writing the data to physically different locations (that is, maintaining mirror copies)

Therefore, the read-to-write ratio is of special importance to high availability disk systems. Decision support systems typically show a very high read-write ratio (nearly all reads, with write access often by periodic batch update) whereas SAP systems with heavy OLTP workload are often far more write-intensive.

RAID 5

The write-penalty of RAID 5 is well known.  Generally, RAID 5 performance is very sensitive to the characteristics of the application as follows:

Characteristics of RAID 5

Access mode

Comments

Small I/O

Fast in random reads  (parallelism)

Large I/O

Slow, cannot be parallelized

Writes

Slow

Small writes

Inefficient, need a read-modify-write sequence (that is, read stripe,
modify appropriate stripe region, compute new parity and write back)

Large writes

More efficient, calculate check and rewrite the entire stripe

Reads

Relatively fast (parallelism), fastest in small random chunks

The type of access mode that dominates in a given SAP system depends on how the system is set up and cannot be stated in general terms. However, note that RAID 5 provides low performance in write-intensive environments.

RAID 0 and 1, RAID 10

Striping (pure striping is RAID 0) is an efficient way to parallelize access and speed up reads and writes by concurrent I/O. It can handle small I/Os very effectively.  However RAID 0 alone does not provide any redundancy and is not a high availability configuration. Note that striping is also used in RAID 5.

The read performance of RAID 1 (hardware mirroring) can be increased by reading from either side of the mirror. Its performance can be further increased in combination with RAID 0, which is then called RAID 1/0 or RAID 10 (striped mirroring). Due to its simplicity in implementation (compared to RAID 5), RAID 10 is a robust technology. Its chief drawback is the 100% increase in spindle costs, because mirroring requires double the disk space. RAID 10 also seems to provide performance advantages in most read/write environments in comparison to RAID 5, but this needs to be verified in concrete OLTP benchmarking for a given installation.

RAID with Write-caching

Write-caching is an appropriate way to compensate for write-penalties encountered with RAID arrays. This is because the host machine CPU only needs to wait for the write cache to accept the data, as the RAID device then takes over the writing of data from cache to disk. Large write caches can completely avoid the write penalty. Note that caches can be critical single points of failure. See below for further discussion of caching.

Failure Scenarios with RAID

In the event of failure, some degradation of performance is to be expected with high availability solutions for disk systems. The following points are specific to RAID systems:

·        Overall RAID 5 performance is degraded, because controller computations are required and access requests are queued in front of the controller, causing a bottleneck (writes are especially affected).

·        Read performance of RAID 5 is heavily degraded, because data must be recomputed from check bits.

·        RAID 1 (and 10) read performance must not necessarily be degraded, because all information is stored twice and can still be directly accessed.

·        RAID 1 (and 10) performance is degraded during mirror rebuild, when the failed disk has been replaced.

·        Replacement of disks can often be undertaken “hot”, without shutting down the system (automatic swap to a dynamically configured standby disk is possible).

·        RAID systems employ an internal mapping system to store the data in the actual physical location and the map is a critical single point of failure.

·        Redundancy rebuild might last an extended amount of time, depending on the present workload and the amount of information lost.

·        A failed RAID drive spindle must ultimately be replaced so service might be needed.

Controller and Microcode

It is the task of the controller and its microcode to create the check information and/or distribute data and check bits across the disk spindles in the array. It is clear that both hardware and software are critically important single points of failure. Configuration inconsistencies and incompatibilities between microcode and hardware can cause unforeseen problems and risks. Due to its internal complexity, RAID 5 has been prone to microcode problems, while RAID 1 and 10 have been relatively robust in this respect.

When upgrading RAID microcode, test using a non-production system

If RAID microcode needs to be upgraded it is strongly recommended to check its performance and reliability on the contents of an uncritical disk first. Updating microcode on a productive SAP system without previous checks might put vital data at risk.

Summary of RAID

This section gives you a quick overview of the advantages and disadvantages of RAID systems followed by an approximate guide as to when RAID might be a good idea.

·        Advantages of RAID systems

Ў        No host machine overhead as hardware works independently of CPU

Ў        Large theoretical storage capacity (generally true for disk arrays, either RAID or JBOD)

Ў        Hot spindle swapping possible

Ў        Configuration flexibility

Ў        Potential for high performance in read-intensive environments

Ў        Financial costs per effective megabyte only moderately higher in RAID5 than without redundancy

Ў        Performance benefits of RAID 10

·        Disadvantages of RAID systems

Ў        Write penalty of RAID 5

Ў        Performance dependence on I/O specification in RAID 5

Ў        Little control over data placement for performance tuning

Ў        Microcode is a critical single point of failure

RAID can be a good solution in the following situation:

§         Application almost 100% CPU-bound

§         Read-intensive application

§         Costs must be kept right down

§         Hot spindle swapping required without system administrative action

Software-Driven Redundancy with LVM

The use of standalone disks with software-driven mirroring using LVM (“logical volume manager”) is the chief competitor to hardware-driven redundancy using RAID systems.

Disk Devices for Software Mirroring

There are two major types of disk devices used with LVM:

·        “Single Large Expensive Disk” (SLED)

Such standalone disk devices have their own controller, own SCSI interface, own power supply and so on. Each standalone disk is a physically separate device.

·        “Just a bunch of disks” (JBOD)

This uses an array of several disks sharing one rack, power supply and a standard disk controller (instead of a RAID-like controller) to handle the data I/O. There is only one physical SCSI path to the whole device. This is why, in contrast to SLEDs, JBODs offer a high limit in terms of total storage capacity.  Devices like these appear as a simple standalone disk to the outside world.

Software Mirroring with LVM

LVM (Logical Volume Manager) software mostly comes as an add-on to or part of the operating system. It provides a general address space for a number of distributed hardware disks and/or partitions. When used for mirroring, the software associates a single logical volume (can be either a file system or a raw-device) with multiple physical copies in a way that is transparent to users and applications (there are also RAID 5-like configurations of LVM, but these are generally CPU-intensive). Commercial relational database systems support LVM software.

The two (or three) mirror copies in a mirror disk system should be placed on disk spindles that are physically as independent as possible (different standalone disks or at least different spindles in an array). If sectors of a disk (or the disk as a whole) containing one copy of the data should fail, the data still remains accessible via another copy (on another disk).

Although not necessary for mirroring, some LVM software is also capable of handling redundant SCSI paths to the device (that is, switchover to the redundant path if one path fails). It is recommended to take advantage of this feature for high availability.

Performance with LVM

One disadvantage of LVM is that, as part of the operating system, it steals CPU cycles from the host machine. This is in contrast to hardware-driven redundancy solutions like RAID where no CPU overhead is involved. In mirroring mode LVM instructs the operating system to physically send data to the storage media twice (or three times in three-way mirroring). The loss in overall performance is claimed to be in the order of only a few percent. This means that the LVM overhead only impacts your SAP system strongly if it is almost 100% CPU-bound.

With LVM (as with RAID) performance degradation depends on the read-to-write ratio, with write-intensive applications causing the most significant deterioration in performance, as described below:

·        Write access is degraded due to the need to write data multiple times. Write performance is dependent on whether file updates are configured to be handled in “asynchronous” or “synchronous” mode. Asynchronous mode means that control returns to the executing program after the file has been written once. After one copy of the file has already been written to disk, it is safe to write the mirror copy asynchronously “in background”.

·        Read access can be improved with LVM, since either side of the mirror can serve the read request.

LVM (as with RAID 0) can be configured to stripe data and so parallelize disk access for improved performance. This is especially effective if work can be striped to several independent disk controllers and/or to different SCSI channels with different disks attached. The applications that benefit most from striping have a large number of random access and large sequential read/writes (if your version of LVM enables you to parallelize a large I/O request internally). Also, applications with large database tables on different disks that can be accessed in parallel normally benefit from striping.

Three-way Mirroring using LVM

Whereas LVM software is normally configured for two-way mirroring, providing a one level redundancy solution (that is, one disk failure can be tolerated), it is also possible to configure it for three-way mirroring (that is, one original plus two redundant copies of the data). The result is two levels of redundancy, since two disk failures can be tolerated. This configuration can be used for SAP systems with very strict high availability requirements and allows the following operation modes:

·        Three data disks online in normal operation

Redundancy is not lost after a single disk failure, since two simultaneous disk failures can be tolerated (that is, two levels of redundancy).

·        Two disks online, one split off for backup

SAP system stays fully online with one level of redundancy during backup (that is, one additional disk failure remains tolerable during the backup period).

Three-way mirroring with LVM can be used to enable an offline database backup with a very short interruption to operations. The database must be very briefly shut down, described in Backup with Oracle To do this kind of backup, the third mirror copy is split off for the backup, allowing the two remaining copies to continue providing normal database service. Three-way mirroring provides the best data availability but is also the most expensive solution in terms of performance and financial costs. Data needs to be written three times to disk in total with a corresponding impact on CPU performance. The costs per megabyte of storage space is increased by a factor of approximately three, because three times the normal disk space is required.

Three-way mirroring is one of the few remaining options to perform backups in an acceptable period of very large SAP system databases (VLDBs) that are permanently under heavy transaction processing load.

Failure Scenarios with LVM

After a failed disk or logical volume has been replaced, LVM must resynchronize the mirrors. Direct impact on CPU performance is said to be minor but all I/O on the failed logical volume is inevitably impeded. Since SAP system data is highly integrated, a general loss of performance is to be expected in this situation. In this respect, LVM is similar to RAID, which also runs into a bottleneck during failure, although the problem with RAID is situated in the disk array rather than in the host machine.

A drawback of current LVM-based disk systems is that hot disk swapping is generally not as well supported as with RAID. Even if the host machine often need not be completely shut down, the SAP system must normally be quiescent during disk replacement. It is sometimes possible to configure hot spare spindles on each side of the mirror to emulate some kind of hot swapping mechanism with LVM. Automatic swapping is hardly supported with LVM configurations.

Summary of LVM

This section gives you a quick overview of the pros and cons of LVM systems followed by a short guide as to when LVM might be a good idea.

·        Pros of LVM systems

Ў        Two or three-way mirroring possible

Ў        Backups possible by splitting off one mirror while data stays online at full performance (with additional redundancy if three-way mirroring used)

Ў        Each disk has its own full set of hardware components (that is, component failure can only affect one disk), more hardware redundancy

Ў        Total physical control on data placement for performance tuning

Ў        Multiple independent disk controllers, no bottleneck, concurrent I/O possible, no single point of failure

Ў        LVM read, performance boost by reading from either side of the mirror

Ў        Striping possible, that is, concurrent I/O on different data channels or disks

·        Cons of LVM systems

Ў        LVM in principle leads to performance degradation because of host machine CPU overhead, but only of the order of a few percent

Ў        LVM write, performance degraded because operating system needs to send data multiple times to disks

Ў        Each standalone disk consumes one SCSI target address, total capacity on bus is limited, if SLEDs are used (JBODs can be used to increase capacity)

Ў        Failed disk replacement less convenient, might require shutting down the SAP system

Ў        Probably more administrative action required (compared with RAID systems), if swapped disk returns to service

Ў        Hardware costs per MB increased, since twice as many disk spindles are used (100% cost increase for SLEDs, less for JBODs)

LVM might be a good solution in the following situation:

§         Application well under 100% CPU-bound

§         Control on data placement required for performance tuning

§         You wish to eliminate all single points of failure

§         Optimal write performance to disk required

Comparison of Software-Driven and Hardware-Driven Redundancy

The decision as to whether you employ software-driven (LVM) or hardware-driven (RAID) redundancy can be a difficult one and depends on the details of your particular installation.

Analyze your installation thoroughly before choosing a solution

This documentation can only give you guidelines as to whether software or hardware redundancy is preferable for your installation. You should make the final decision only after consultation with your hardware vendor, having taken the details of your installation fully into account.

The following describes two extreme cases to illustrate the issues involved:

·        Maximal CPU performance required, write access to disk less critical

Raid systems (preferably RAID 10) are normally preferable due to the following factors:

Ў        Data handling is done by hardware situated internally on the disk array

Ў        CPU performance is unaffected

Ў        Rebuild does not put load on CPU in case of failure

Ў        Ease of service with disk replacement (disks can be swapped “hot”)

Ў        RAID 5 offers low prices per usable effective megabyte of disk space but low write performance

Ў        Devices with a large write cache are preferred

·        Maximum write performance to disk required, CPU performance less critical

LVM mirroring on standalone disks are normally preferably due to the following factors:

Ў        CPU cycles for LVM can be sacrificed

Ў        LVM striped over several disks to parallelize output and maximize performance

Ў        LVM configured to read from either side of the mirror

Ў        Preferably use device with large write cache                                  

Ў        Possible options to split off third mirror for backups in three-way mirroring

Ў        For minimal disruption in case of failure have hot disks ready on each side of the mirror

If you plan to use both types of disk redundancy, LVM and RAID 5, then SAP generally recommends adopting disk technology according to the predominant access mode for the data (that is, read or write intensive). See the list above in “How to Achieve High Availability”.

As an example of this principle, SAP recommends placing transaction logs (for example, Oracle redo log files or Informix dbspace LOGDBS) on  mirrored disks using LVM since this data requires heavy write access. The remaining database data should be placed on disks using RAID 5 technology, assuming the write access to this data is not so heavy.

It is also possible to split SAP system data according to its access requirements but this requires some expertise in SAP system installation tuning. For further information, see the SAP database installation guides.

Caching on Disk Devices

Disk devices can greatly benefit from caches. However, the performance benefits must be balanced against the availability implications in the event of failure, as discussed below.

Performance Benefits

Disk I/O performance can be greatly improved by utilizing caches. The access times of disks and SIMM modules (used in caches) differ by a factor of up to one million (around 10 milliseconds with disks compared to 10 nanoseconds with cache SIMMs). Write and read caches are available, as described below:

·        A write cache accepts the data sent by the SAP system very rapidly, so freeing up the CPU for new work. The slow process of actually writing data onto the disk spindle is then handled internally on the disk device.

·        A read cache can serve SAP system read requests without actually accessing the disk if the cache is large enough to allow a high hit rate, that is, identical data items are requested multiple times with high frequency.

Consider caching as a means of improving high availability disk systems

Generally large disk caches should be able to compensate the write penalty encountered with high availability disk systems. Disk devices with cache sizes up to 4 GB are currently available. However, the quantified payback of the cache increment is highly dependent on the SAP system usage pattern.

Failure Scenarios with Caches

Particular attention needs to be paid to write caches since they hold data that has not yet been permanently stored on the disk spindle itself. These represent a single point of failure and so deserve fault tolerant measures to increase their redundancy, as follows:

·        To protect against power failure, data can be retained by using cache with non-volatile SIMMs or a battery backup system for powering the SIMMs.

·        Mirrored SIMMs can be used so that a redundant copy is held in case of SIMM hardware failure.

For read caches redundancy precautions are in principle far less important since data is still permanently available from the disk itself.

Bus Architecture

Bus speed is not going to be a bottleneck for performance, as long as the limiting factor lies in the actual write process of the data onto the disk spindle. However, large caches could shift the bottleneck more to the bus architecture side. Then a high-speed, state-of-the-art bus architecture helps to maximize the performance of disk data storage with high availability options. However, note that maximum performance requires that the disk device supports the bus transfer rates internally.

Exploit OS support for redundant bus architectures if available

In high availability solutions for the SAP system, SAP recommends taking advantage of operating system support for redundant bus architectures. For example, if two separate channels (SCSI paths) exist to connect to a given disk, a suitable operating system device driver can be configured to try a second channel if the first one fails. Some LVM implementations support this high availability feature.

The following sections discuss SCSI bus technology and fiber channel (FCS) technology with respect to high availability. There are additional vendor-specific bus architectures with high performance, which might be suitable for high availability purposes. However, note that standardized equipment has clear benefits in terms of general compatibility. Consult your hardware vendor for detailed advice about the most suitable bus architecture.

SCSI

SCSI stands for “Small Computers System Interface”. Key factors for SCSI bus architectures include the following:

·        SCSI-1 versus SCSI-2  (both ANSI-standards)

·        Single-ended versus differential bus

·        Width of data bus connection ( 8, 16 or 32 bit)

·        Speed of data bus connection (5, 10, 20, 40 MB/sec on the specified bus width)

Note that the distinction between single-ended and differential SCSI bus is made at the electrical level. This implies that, although connectors might be the same, these systems are not compatible and can harm each other electrically when connected without conversion facilities.

Generally SAP recommends using the bus technology that offers you the highest robustness, bus width/speed, and device connectivity. Therefore, differential bus SCSI-2 at largest widths and highest speed are the option of choice. Differential busses offer the benefit of improved robustness and less noise above SCSI-1 systems. Generally SCSI-2 can also be run at highest transfer speeds.

Note the following:

·        A given bus can run at 5 MegaTransfers per second (conversion to MB/sec depends on bus width) and still claim to be SCSI-2.

·        Bus transfer rates are normally only given as theoretical, maximal values that you can use to compare different architectures. In practice the realistic and sustained transfer rates are approximately 30-50% lower. Only the latter, corrected values are useful for quantitative estimation of the expected performance.

·        Bus architectures also differ in their maximal bus length and number of maximally connectable devices. These are important figures when planning the expansion of your disk system for future requirements (see below).

SCSI bus termination

Large installations often use shared SCSI buses to which several SCSI disk devices can be attached. Refer to Cluster Technology. Attaching disks properly is a high availability issue in terms of permanent SCSI bus termination. Any SCSI bus has to be terminated properly on exactly two ends to make sure of correct bus operation. It is common practice to provide SCSI termination on board the SCSI disk device directly (on the SCSI controller). However, this means that the concerned disk becomes a single point of failure. If the disk providing one of the terminators on board has to be taken off the bus for repair or service, the SCSI bus is “open” or unterminated, leading to unavailability of all SCSI disks on the shared bus.

It is easy to avoid this problem by always terminating a SCSI bus externally, that is, by attaching all disks through Y-cables and plugging terminators directly into the cable outside any device. Then, even if one of the disk devices on the bus has to be taken offline, the bus still remains fully operational.

Fiber Channel Standard  (FCS)

FCS can supply massive performance benefits. FCS is an ANSI-standardized, high-speed interface that can span large distances (up to several km) and sustain transfer rates of up to 100 MB/sec (speed and distances are hardware-dependent). A larger number of connectable devices and support for error detection and correction at the hardware level are also provided. The benefits of this technology are maximized if a given disk can also support these high transfer speeds internally.

Capacity Options

The maximal number of devices on the bus determines how much disk capacity can be attached to your SAP system directly. FCS can potentially support more devices than SCSI. In many cases, however, CPU and bus performance sets limits to the maximum possible number of disks.

It is important to consider capacity and growth options at an early stage when choosing new disk equipment. A system with future growth potential performs better, more reliably and is easier to expand as required. A system with its capacity fully utilized at the outset causes problems when trying to accommodate increased needs.

In this context, note that disk arrays (either JBODs or RAID systems) have the advantage of supplying a large amount of disk space on few SCSI channels, that is, the potential storage capacity is very large. In this respect it is a drawback of single standalone disks that each disk consumes one SCSI channel, restricting future growth potential (at least theoretically, since disk performance might already impose more stringent limits).

Disk Replacement Procedures 

Even in a high availability disk system, failed disks must ultimately be replaced. The degree of service disruption required to finally return to normal operation varies greatly.

Retain at least one spare disk in case of failure

This is a simple recommendation but can save a large amount of time if failure occurs. The disk should be ready to “plug in and go” (that is, configuration is not required). However, it is worthwhile testing this before a real failure occurs.

The following terms are used to qualify the way in which the replacement can be provided:

·        Cold swap – system stop, power off

·        Warm swap – system stop, power on

·        Hot swap – system  running,  but operator action needed

·        Automatic swap – system running and automatic online disk replacement

The replacement method is determined by the fundamental design of the disk device. That is, devices with lower levels of "swappability" do not generally provide upgrade options. The “hot swap” feature increases the cost of disk space by up to 50% per megabyte. This should, however, pay back in terms of reduced downtime and service costs.

The ability to do a hot swap is common with disk arrays (either JBODs or RAID systems). In general, RAID systems automatically start resynchronizing after a failed disk has been hot-swapped, without any subsequent administrative intervention. LVM working on JBODs can also take advantage of hot-swapping of failed spindles but possibly requires shutting down the application and some administrative action. For LVM-driven mirrors, it is best to keep a hot spare disk on either side of the mirror to minimize disruption in case a disk fails.

Journaled File Systems (JFS)

Journaled file systems (JFS) have been designed with failure resilience, decreased boot time after system crashes and increased online performance in mind.

Consider JFS for high availability of your SAP system

It is recommended to use journaling file system architectures instead of normal file systems if high availability is an issue for your SAP system, since it can considerably reduce the reboot time after failure. It also facilitates many administrative tasks and might improve runtime performance.

System failures (for example, node power failure) can lead to inconsistencies in the file system after a crash. This is why UNIX systems must check and, in case of corruption, reconstruct the file system during startup. Normally this procedure includes a full scan of the entire file system, greatly extending total downtime (especially for large systems).

The obvious advantage of JFS lies in the fact that only files that are corrupted and need to be reconstructed are addressed for checks at when the system is started. Checking the file system only requires a short time.

While accessing a file in normal system operation JFS employs a synchronous logging mechanism similar to the redo-logging employed by relational database systems. Meta-data that contain the changes to the file are written to a reserved area and applied to the real file only after a “commit”. Corrupted files can be reconstructed from the logs after a system failure has occurred.

Due to the internal architecture of JFS, many administrative tasks on the file system can be performed online, such as increasing storage space, defragmenting and backing up. JFS also often delivers higher runtime performance. These features make JFS the file system of choice for high availability requirements.

Checklist for Single Points of Failure for Disk Systems

Check the disk aspects of your SAP system for single points of failure

In order to render your SAP system highly available, you should aim to avoid single points of failure as far as possible. You should identify and prioritize possible points of failure within your SAP system in terms of how likely they are, what effects failure would have and how expensive a solution providing redundancy would be. Finally, when a given solution is implemented, you should test it to ascertain how reliable it actually is.

Possible single points of failure in the hardware of a disk system include the following:

·        Power supply

·        Fan and cooling

·        Internal/external cabling

·        SCSI path from host machine to device

·        SCSI bus terminators

·        SCSI disk devices with internal SCSI terminators

·        Internal system bus

·        Write-cache:

Ў        Non-volatile SIMMs or battery backup serve to address power failure

Ў        Mirrored SIMMs to address SIMM failure

·        Read-cache, non-volatile SIMMs optional

·        Battery power for the device to store cache to disk in case of power failure

·        Controller

·        Micro code

·        Disk-internal storage processors

·        RAID internal storage maps

·        Disk spindles

·        Spindle mechanism

Possible single points of failure in the disk-based data are the following:

·        SAP system user data

·        SAP system data

·        Software components:

Ў        SAP system

Ў        Database and log files

Ў        Operating system and swap space

·        Root file system