Moving a 3TB Database Datafiles With Only 2 Minute Downtime

In the last week, I had couple of my customer ask me the same question: how can we move our data files between disks with minimum downtime?

This is actually a really good question. When we want to move a database around (let’s say to a new storage or a new disk device) we will need to do it when the files are closed (offline or database down) in order to keep them consistent. We can do it tablespace by tablespace or the entire database but at the end, it all depends on the size of the files that needs to be transferred. By the way, in Oracle 12c this problem is obsolete since there is an online move command

In the example I’m giving here, the customer had a 3 TB database with most of it in the same tablespace. The downtime allowed by the customer SLA to his customers was around 5-7 minutes so we obviously couldn’t take the database down or take the tablespaces offline. We needed another solution.

Using Data Guard

The first solution came to mind is setting up a second database in a data guard configuration. Adding a physical standby, moving the database to there, thus allowing the application to continue working sounded like a perfect solution.

Since the database is quite big – setting up the data guard would require another 3TB of storage (in addition to the 3TB it already taken and the new 3TB we were to move the data files to) so this became a bit of an issue. Another thing with this solution was the need for a second server with sufficient resources (CPU and RAM); we also needed the application to know how to connect to the temporary server.

Another idea regarding data guard was to setup a second database on the same server as the first one and switching over to it when we are done. This would of course mean to change the database name but that not a big deal in most cases.

We started talking to the customer about the data guard setup processes and the parameters needs to be changes and the customer asked us to think of other solution with some downtime but with less interaction and configuration changes.

Using RMAN Backup

I tried to think about the problem and cut it into smaller pieces. What we wanted to do is move the files to another disk and then remount the new disk as the old disk. In order to do that we need to duplicate the data files. We can’t stop the database in order to copy the files and copying the file might take a long time (around 20 hours) in which the application generates more data.

Suddenly I came to the realization that solution was staring me in the face while doing the data guard testing – I could take an image copy full backup to the new location and then just switch the mount points while the database is down.

When adding to the equation the fact that I need to apply the changes gathered when the database was copying I decided that the best solution is to use the Merge Incremental Backup.

Merge Incremental Backup

Merge incremental backup is one of the best solutions for backing up VLDB (very large databases) and having them restored very fast (without using external or storage based backups/snapshots).

The idea is simple: keep a second set of your DATAFILES at hand at all times ready to be used by your instance. Every day (or hour or whatever time window you decide), take an incremental backup of the changes happened on that time slice and apply them to the second set of datafiles.

When there is a problem with the first set, just redirect the instance to work with the second set, recover the changes done since the last apply and open it for users’ use.

This is very simple to set up once you understand the basic logic and the recovery is even easier than simple restore-recover.

Using Merge Incremental for Moving Files Around

In my case, I wanted to keep the original directory structure and the same database file name but build it in another mount point.

I mounted the new disk and built the directory structure. The next step was to copy the data files to the new location. In the original implementation of the solution, we don’t really control the first image copy set – it is done by the BACKUP INCREMENTAL LEVEL 1 command in our script so I had to build the set myself.

run {
        ALLOCATE CHANNEL T1 DEVICE TYPE DISK;
        ALLOCATE CHANNEL T2 DEVICE TYPE DISK;
        ALLOCATE CHANNEL T3 DEVICE TYPE DISK;
        ALLOCATE CHANNEL T4 DEVICE TYPE DISK;
        BACKUP incremental level 0 AS COPY
			tag='incr_update'
			DB_FILE_NAME_CONVERT = ('oranfs','oranfs_new')
			database;
    }

In this command we allocate 4 channels to copy the files (this only work on enterprise edition – one of the databases I needed to move was standard edition so it didn’t work) and create a level 0 incremental backup as copy images of the files. I also used the DB_FILE_NAME_CONVERT to copy the data from /oranfs to /oranfs_new mount point with the original file names. It was easy to do because all of the files were at the same location but if they weren’t – that would have been an easy task to change.

The last thing was to setup a tag to the backup since I knew I will want to use it later. The name of the tag was “incr_update”.

The second stage was done after finishing the initial online copy. In this stage I need to capture the changes done since the beginning of the copy and merge them into the image copy itself.

RUN {
        ALLOCATE CHANNEL T1 DEVICE TYPE DISK;
        ALLOCATE CHANNEL T2 DEVICE TYPE DISK;
        ALLOCATE CHANNEL T3 DEVICE TYPE DISK;
        ALLOCATE CHANNEL T4 DEVICE TYPE DISK;
        BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'incr_update' DATABASE;
        RECOVER COPY OF DATABASE WITH TAG 'incr_update';
        sql "ALTER SYSTEM ARCHIVE LOG CURRENT";
        DELETE NOPROMPT OBSOLETE;
        BACKUP CURRENT CONTROLFILE;
}

Explanations:

First we open the 4 channels for better performance.
Then we take an incremental backup (level 1) for recover of copy with the tag we decided in the initial backup. This will take an incremental backup based on the image copies themselves.
After finishing the backup, we recover the copy using the same incremental backup we just took.
We switch logfile to clear the archive log
We delete the archive logs and the obsolete backup (since we just applied it).
Backup the control file (which we need for full recovery – not really our plan here).

We can run this part as many times as we’d like – keeping our files closer to the real files until we decide to switch them. The problem is that this stage might be very long since this is a very big database.

Improving Performance

In order to make the incremental go faster, we decided to enable the block change tracking. This will create a new file outside the database that will mark which blocks were changed since the last backup. This makes the incremental backup much much quicker (couple of minutes instead of hours).

In order to do that, we enabled the feature:

alter database enable block change tracking using file '?/dbs/prod_cbt.dbf';

Switching the files

The next step was to take the database to its minimal required downtime. In this stage, we stopped the database, dismount the mount point, switched the mount definition and remount them back. Since our control files are in a different location and so as our redo logs – our datafiles had just been switched. We are all ready to go, right? Wrong. The datafiles still need to be recovered since the control file has a higher SCN number than our image copy files but this is easy since we have the archive logs and original redo logs.

We start the database in mount mode and recover it:

Recover database;

Once the recovery is over, we can open and we are done.

Pro tips

This solution only copied data files (including UNDO). Missing temp files will be created automatically.
Make sure all the control files are at their location before starting the database. Copy them if they are missing.
If the redo logs reside with the data files, we will need to copy them to the new location. They still contain data we need in order to recover to the last SCN in the control file.
Do not open resetlog the database unless you are willing to lose some data.
When running the RMAN commands, use this export in the command line before to get RMAN to show time as well

export NLS_DATE_FORMAT="dd-mm-yyyy hh24:mi:ss"

Don’t forget to clean up the old files later…

Summary

With this solution, we were able to move the 3TB database with a minimal under-2-minute downtime (most of it because of shutdown taking so long).

This procedure was tested on 11.2.0.3 and 11.2.0.4 Enterprise edition databases (with parallel backup). We also ran it on 11.1.0.7 SE (with parallel) and 11.2.0.3 SE (without parallel). The BCT feature only works on Enterprise Edition.

I guess there are some other solution out there for doing the same thing – I’d love to hear about them in the comments!

02/09/2015/10 Comments/by Zohar Elkayam/Estimated Reading Time: 7 Minutes

10 replies

Rotem Adhoh says:
02/09/2015 at 12:38

Great post! The process is explained very well.
Merge incremental backup is definitely one of my favorite features for backing up VLDBs.
However, ASM based storage would have been much more useful in this case, since you can migrate data to new disks without downtime at all. I wonder why this customer chose NFS mount instead..
Anyway, thank you very much for sharing!
Reply
yusuf nar says:
02/09/2015 at 15:17

what about using ASM? you can add new disks to ASM diskgroup and then drop old ones with no downtime?
Reply
- Zohar Elkayam says:
  02/09/2015 at 15:43
  
  It depends on what you mean.
  
  If you want to move data files between disk groups, you can’t do it with this exact solution: you’ll need to tweak it a bit. I will try to make a follow-up post on that, if you’d like.
  
  If you want to move the files between physical disks (move the disk group to another storage machine for example), it’s possible without any downtime to the application as a built in feature of ASM. To do that, you just add new disks to that disk group, rebalance the disks and remove the old devices (it will rebalance the data to the new disks, leaving the old ones empty). The only thing you need to look for is the ASM rebalance power since it might be a little stressful on the server itself. We did that solution for a customer when he wanted to move from EMC storage device to NetAPP (or vica versa) so this work very nicely.
  Reply
Fonfon says:
04/09/2015 at 02:03

Nice post.
Reply
Kamran Saeed says:
11/09/2015 at 12:54

Nice Sharing and i pray for your success
Reply
jens Linnemann says:
12/09/2015 at 11:36

Don’t you need double space copying with Merge Incremental Backup?
Reply
- Zohar Elkayam says:
  12/09/2015 at 11:45
  
  You’re right – we do need double space but in this case, we needed to move the database to a new set of disks (new mount point) so we had the double space.
  Reply
Bob says:
31/12/2020 at 16:25

Hi Zohar,

Did you write up moving the data files to different ASM disks?
Reply
rehan says:
20/02/2021 at 23:57

was it complete recovery in the end ?
Reply

Trackbacks & Pingbacks

Oracle 12c Caching and In Memory Databases - Real DBA Magic says:

09/05/2016 at 11:08

[…] a long explanation about several hybrid solutions (such as the MySQL memory storage engine, and Oracle 12c database In-Memory option) but this post will focus on the Force Full database cache, which will be explained in the second […]

Reply

Want to join the discussion?
Feel free to contribute!

Moving a 3TB Database Datafiles With Only 2 Minute Downtime

Using Data Guard

Using RMAN Backup

Merge Incremental Backup

Using Merge Incremental for Moving Files Around

Improving Performance

Switching the files

Pro tips

Summary

Like this:

Trackbacks & Pingbacks

Leave a Reply

Leave a ReplyCancel reply

Moving a 3TB Database Datafiles With Only 2 Minute Downtime

You might also like

Trackbacks & Pingbacks

Leave a Reply

Leave a ReplyCancel reply