In the last week, I had two customers that had some failures with their standby databases and contacted me about closing their DG gaps. Since this kind of problems is common, and since the solutions are fairly easy I thought it worth a post to document this for their and your use.
Before we begin, let’s understand what dataguard gaps are. There are two types of gaps: transport and apply gaps. The transport gaps problem usually starts after a network disconnection between the primary and standby databases. At this time, the archive logs are not being shipped between the databases – and this is called a transport gap. The other case is the apply gap: in this case, we have all the files but the standby didn’t finish applying them all yet.
Our problem begins when during a transport gap, the archives we need to close the gap (in the primary) are removed. In this case the dataguard will not be able to continue rolling the log files and will hang. This is exactly what happen to my customers, and their question was what to do next – preferably, without rebuilding the entire standby database.
A customer called me up and said that he’s hitting “ORA-14766: Unable To Obtain A Stable Metadata Snapshot” and he’s coming up dry on his google search.
This was the first time I encountered this problem, so I thought I’d investigate and write a few words about it, because the problem seems it might be pretty common and the solution was a bit weird.
It’s been a while since the Israeli user group (iloug) had a technology meetup (SIG meeting). The last time that happened was over two years ago – and since then, we only had the bigger conferences with guests from all over the world. Yesterday we renewed that long time tradition and held such a meetup.
Although I am not part of the OUG board (and not for the lack of trying, just no elections for a very long time), I volunteered to help host the meeting together with Oracle Ace Associate Oren Nakdimon (@dboriented, http://db-oriented.com). I also presented a new session: “Oracle 12c New Features For Better Performance” (see below for the agenda and slide deck).
So I’ve been using Oracle 12cR2 for a couple of weeks now (getting ready to ilOUG meetup in a few weeks), and I decided to share my favorite non-important feature of the new version: the History command for SQLPlus.
As most of you already know, I’m a huge fan of SQLcl (aka SQL Developer Command Line Interface). I’ve written about it a little, and talked about it in conferences a lot. In the last conference someone came up and asked me, “Okay, sqlplus can do most of those stuff – but what do you use the most” and I answered that the history command actually changed the way I work.
That was the case until Oracle 12cR2. Starting 126.96.36.199, SQLPlus now has the ability to show list of last commands. Now, this might sound like a small change, but it’s so useful I’m surprised SQLPlus took only 31 years to implement it…
Working on a data warehouse system can be quite challenging, as I mentioned in the post from yesterday. One of the things we need to take care of is the amount of parallel processes that are in used at all times. Yesterday I wrote about how to locate downgraded sessions. Today we will look at another aspect – who “steals” parallel processes and what can we do to solve it.
One of the biggest thieves of parallel processes in a data warehouse environment are actually developers and DBAs. Sometime, while developing or just handling the system, DBAs run queries in GUI tools (TOAD, PL/SQL Developer, SQL Developer). That for itself isn’t a big deal, but those tools have a common feature. Instead of returning the entire data set, it sometimes return only a few records (the number depends on the tool). In that case, the cursor is being kept open and is waiting for then next fetch or until different query is ran. While waiting, the parallel processes are being reserved for that query, but the query coordinator is marked as Idle.
If there are couple of DBAs running those queries from multiple windows and they “forget” their queries (because sometime they run for very long) – that becomes a problem: “real” application queries are being downgraded causing the mess I described yesterday.
To solve that, we created the following query. We actually took that query and wrapped it with shell script to automatically kill the session, but let’s keep it to the basic query we used.
I was working with the data warehouse team at a customer site and at some point we realized that some parallel executions are not getting enough resources (downgraded).
Not getting enough parallel processes in such a complex environment is really bad. That means that since everybody is hogging the CPU, some sessions will not be able to complete inside the night ETL time frame. If that happens – some ETLs will go on into the day providing wrong data to the customer in even worse performance for the morning shift. Another aspect is the memory usage for large sorts or hash joins. Using less processes will mean some of the data will not reside in the memory and will need to be allocated in the temporary tablespace.
The customer asked how can he find those downgraded (meaning, not getting enough parallel processes) at real time. This is the script we created for that.