Recovery with DB2 UDB for z/OS


To minimize downtime in the event of failure, you must make sure that you can quickly restore database data. Some recovery operations are done automatically by DB2 UDB for z/OS without any outside intervention, such as recovering the database to a consistent state before an operating system or database failure. In this case, automatic recovery happens at the next start of DB2 UDB for z/OS.

DB2 provides the RECOVER utility for data recovery. This lets you recover DB2 objects such as tablespaces, indexes, partitions, individual datasets, and individual pages. With the RECOVER utility you can recover data to:

·        The state captured in a particular backup (the TOCOPY option),

·        The state at the time corresponding to a relative byte address (the TORBA option) or a log record sequence number (the TOLOGPOINT option). The TORBA option is used in non-data sharing and the TOLOGPOINT option in data sharing environments.

·        The current state by not specifying any of the above options.

The RECOVER utility also has the LOGONLY option, which allows you to recover the data using the log only, starting with a backup that was created outside DB2 (for example, RVA SnapShot).

There are the following types of recovery:

·        Recovery to the current state

A recovery to the current state is generally less demanding and is usually needed more often than a point-in-time recovery. A typical example is volume failure in a direct access storage device (DASD), resulting in data loss. You need to find out which tablespace and indexes resided on the volume and recover only these tablespace and indexes, or even only partitions or individual datasets that are affected. The rest of the system is already at the current state and need not be recovered.

·        Recovery to a prior point in time

This type of recovery is used to reinstate the database to the condition it was in at a prior point in time. All changes after that time are lost. You must carefully consider the decision to set the system back in time. Typically, a recovery to a prior point in time is needed when an application program logic error introduced unwanted and irreversible changes into the system.

This type of recovery is explained in "Process Flow" below.

You can speed up any recovery by splitting the job into multiple parallel recovery streams to reduce disk contention, but note the restrictions documented in the IBM documentation DB2 and IMS Tools Overview (GC27-1070-08). Note that the REUSE option of the RECOVER and REBUILD utilities significantly reduces the overall recovery elapsed time.

Process Flow


       1.      You consider the following when choosing a method for a point-in-time recovery:

Ў        How soon the data must be available again

Ў        Which point in time you want to use for the recovery

Ў        Whether offline backups are available

Ў        Whether indexspaces were included in an offline backup

Ў        Whether quiesce points are available

       2.      You choose one of the following methods for recovery to a prior point in time:

Ў        Recovery with conditional restart

This method causes the least interruption to production operation. It requires neither offline backups nor quiesce points and this makes it the best choice in high availability environments. It can also bring the system closest to the time when the database is known to be consistent, so reducing unnecessary data loss.

For more information, see the SAP documentation SAP Database Administration Guide for SAP NetWeaver on IBM DB2 UDB for z/OS.

Ў        Recovery to a consistent offline backup

This is the simplest and fastest method, but it is also the most restrictive, for the following reasons:

§         It requires creating offline database backups, which means planned system downtime that some SAP installations cannot tolerate. Note that all backups used in a recovery have to be created during the same downtime.

§         It can set the system further back than necessary, depending on the backup frequency. For example, assume that offline backups of the database are scheduled weekly on Sundays. If the data was damaged on Friday, all the changes made from the previous Sunday to Friday would be lost when Sunday's offline backup was used for the recovery.

Ў        Recovery to system quiesce point

This depends on the existence of "system quiesce points". These are the points in time when there are no uncommitted update transactions in the system. Such a point is specified by the corresponding relative byte address (RBA) or log record sequence number (LRSN).

Recovery to a system quiesce point is better than to a consistent offline backup, because establishing a quiesce point is less disruptive to the SAP system than creating an offline backup. Therefore, it is more likely that recovery is possible to a point that is closer to the required time, so reducing data loss. However, establishing frequent quiesce points can significantly reduce performance.