Closing Dataguard Transfer and Apply Gaps

In the last week, I had two customers that had some failures with their standby databases and contacted me about closing their DG gaps. Since this kind of problems is common, and since the solutions are fairly easy I thought it worth a post to document this for their and your use.

Before we begin, let’s understand what dataguard gaps are. There are two types of gaps: transport and apply gaps. The transport gaps problem usually starts after a network disconnection between the primary and standby databases. At this time, the archive logs are not being shipped between the databases – and this is called a transport gap. The other case is the apply gap: in this case, we have all the files but the standby didn’t finish applying them all yet.

Our problem begins when during a transport gap, the archives we need to close the gap (in the primary) are removed. In this case the dataguard will not be able to continue rolling the log files and will hang. This is exactly what happen to my customers, and their question was what to do next – preferably, without rebuilding the entire standby database.

FAL Settings

In general, when the dataguard (in this case, the standby database) recognizes that it has a transport gap between the archive logs it received (from the primary database) and what was applied, it will automatically request the missing files using the FAL (Fetch Archive Log) service. This require us to set two parameters during the dataguard deployment. The parameters are FAL_SERVER (i.e. where to close the gap from – the other servers in the configuration) and FAL_CLIENT (i.e, who is being checked, which is usually the local server).

This service works very nicely, but happens when the primary (or any other server we set in the FAL_SERVER) doesn’t have the archive at hand? What do we do then?

Manually Restoring Archive Logs

The easiest approach will be to restore a backup of the missing archives and manually move them to the standby database. This is usually my go to recommendation. In our first case, this is what my customer did. They restored the files, copied them to the standby database and… nothing. The files were there, but the database didn’t know about them. What we need to do now, is to register the files manually to the control file.

We can do that in two ways. Using SQL*Plus:

SQL> alter database register logfile '/tmp/tmp_arc_from_bck/arch_1_12311.arc';

or by using RMAN. We use this method when we’re registering a large number of files:

rman> connect target /
rman> catalog start with '/tmp/tmp_arc_from_bck/';

Once we do that, the managed recovery process will automatically recover the data from the new files and continue its ongoing work.

Closing the Gaps using Incremental Backup

In some cases, we don’t have the primary database archive logs on backup and we can’t restore them. In that case we can either recreate the standby database (which we don’t want) or try to restore the gap using incremental backup.

Closing the gap using incremental backup is fairly easy:

  1. Find the current SCN
  2. Backup the primary database from the point (it might take some time, depending on the size of the database)
  3. Recover the incremental backup in the standby database:

Finding the current SCN

On the standby database, stop the managed recovery (it doesn’t work anyway – we have a gap we can’t close automatically):

alter database recover managed standby database cancel;

Find what is the current SCN for the standby database:

select to_char(current_scn) from v$database;

This query will return a number (we casted it into string since sometimes the number is big and it shows up at scientific notation that doesn’t help us). For example, let’s say the number is 14792011.

Backup the primary database

Using the number we found, we go to the primary database and create a new incremental backup from the point that the standby is missing and until now. The longer the gap was, the bigger the incremental file will be.

run {
  allocate channel ch1 type disk format '/u01/backup/INC_%U.bck';
  backup incremental from scn 14792011 database;
 }

Since when restoring, we will be skipping ahead the data files – we will also take an backup of the controlfile for standby from the primary database:

alter database create standby controlfile as '/u01/backup/STBY_controlfile.ctl';

Once we got everything right – we transfer the newly created files to the standby server.

Restoring the database

Using RMAN, we restore the control file

rman> connect target /
rman> startup nomount;
rman> restore controlfile from '/u01/backup/STBY_controlfile.ctl';

At that point, we can start the database to mount, and catalog the backup files:

rman> sql 'Alter database mount';
rman> catalog start with '/u01/backup/';

We are all set and ready to recover:

rman> recover database;

When the recovery is done, it will “fail” – telling us that it’s missing more files for the recovery – but that’s okay because the FAL will take care of it from here.

We can now start the managed recovery:

alter database recover managed standby database using current logfile disconnect from session;

Avoiding the Scenario

One thing we can do to avoid the scenario is protect our archivelogs in production from being removed before they are applied in the standby databases.
For that, all we need to do is set a parameter in our RMAN configuration and make sure to remove archive files using RMAN (and not manually removing them):

CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;
2 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply