Thursday, January 9, 2014

Block Change Tracking Not Used Though Enabled

During normal checks of the backups, we observed the backups are taking longer than usual. These are QA databases which are not backed up thru schedules. We use Ad hoc backups for these hosts to conserve space. Also we usually take monthly full and rest of the days L1 incremental backups.  Also the change i.e. block change tracking was implemented recently on these hosts. We double checked that earlier runs were finishing with BCT in around couple of hours which has now gone back to previous run times of 7-8 hours. Strange that why this happened??!!

Following confirms what we were observing.
 BCT is enabled and alive.
Then we checked whether datafile backup is using BCT or not.

select completion_time, datafile_blocks, blocks_read, blocks, used_change_tracking
from v$backup_datafile where file#=8 order by 1;

 




We also checked the OS process, responsible for writing changed blocks to BCT file and found it running on both nodes of cluster .

[oracle@  rman]$ ps -ef | grep ctwr

oracle    6284     1  0 Jul26 ?        00:00:00 ora_ctwr_xxxxx1

Alert log also shows that process is active as well–
CTWR started with pid=65, OS id=2653

Block change tracking service is active.

So, we have everything looking healthy but why the BCT is still not used. We researched a bit on MOS and we got the reason why it was not used.

Cause –
If L1 backups are more than 8 then you need to increase the parameter _bct_bitmaps_per_file (init.or or spfile setting) from its default of 8 to the number of incremental that one take between full backups.   Also, if you have taken more than 8 incremental backups since the last full back up, you need to take another full back up before the change will kick in, because the historical bitmaps have already been deleted.

When I checked the number of L1 taken since last full backup they were in excess of 8!!  So we decided to take full backup and after that when we checked incremental, the issue was resolved.

Lesson learned –
If there are 8 or more than 8 L1 backups since last L0, the BCT will not be used for backups.


OEM Targets are in Pending State, Not Monitored

This was really wierd problem that we faced. Usually when we configure the new target in Grid Monitoring Console, it automatically detects the status post first upload from agent. Sometime it also takes more time to get the correct status reflected. 
It might also happen that one has to clear the previous state of the agent and discard the old data, which yet to uploaded to OEM repository, due to either host/agent crash or comunication link failure. One needs to shut down the agent and issue following command to get it in working order.

#<AGENT_HOME>/bin/emctl stop agent
#<AGENT_HOME>/bin/emctl clearstate agent
#<AGENT_HOME>/bin/emctl start agent

However, in this particular case it was giving us all sorts of headache. We tried following solutions to get it in working state and none of them worked. 

1. tried removing and readding targets back
2. Agent resync
3. agent clearstate
4. Adding targets with new name
5. Agent stop, wait for 10 mins and start again (This is work around suggested for bug faced in 11g OEM)

One of the solution could be completely remove the agent, reinstall and re-configure the targets, which is obviously long and tedious process. Also there is a risk of leaving production environment without monitoring, which every change management and review guys find difficult to digest. 
This happens to one of node in 8 node cluster. So the instance running on other nodes are detected and showing correct state, but on one node it always shows in pending state. Also on this particular node there are some targets in pre-prod env, hence under black out to avoid false alarms. 

So in order to identify the root cause for this, first we need to understand the reason. 
We checked the following table under sysman schema in repository database. 

select TARGET_NAME,owner,discovered_name,display_name,dynamic_property_status,broken_reason,broken_str from mgmt_targets where target_type='oracle_database' and target_name=''xxxx'

If you don't have the access on OEM repository database then, you can use following workaround, to get the data..
http(s)://host.domain:port/emd/browser/main

Another query one can use is as follows...

[oracle@/oracle/ent11gr2/agent11g/bin]: ./emctl status agent target xxxPRD,oracle_database

Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Target Name : xxxPRD
Target Type : oracle_database
Current severity state
----------------------
Metric          Column name             Key             State           Timestamp
--------------------------------------------------------------------------------
DeferredTrans   errortrans_count        n/a             CLEAR           2014-01-21 03:30:04
DeferredTrans   deftrans_count          n/a             CLEAR           2014-01-21 03:30:04
UserLocks       maxBlockedSess          UL              CLEAR           2014-01-21 03:29:51
UserLocks       maxBlockedDBTime        UL              CLEAR           2014-01-21 03:29:51
UserLocks       maxBlockedSess          TX              CLEAR           2014-01-21 03:29:51
UserLocks       maxBlockedDBTime        TX              CLEAR           2014-01-21 03:29:51
UserLocks       maxBlockedSess          TM              CLEAR           2014-01-21 03:29:51
UserLocks       maxBlockedDBTime        TM              CLEAR           2014-01-21 03:29:51
Response        Status                  n/a             CLEAR           2014-01-21 03:29:39
Response        State                   n/a             CLEAR           2014-01-21 03:29:39
health_check    Status                  n/a             CLEAR           2014-01-21 03:32:12
health_check    Unmounted               n/a             CLEAR           2014-01-21 03:32:12


Once we have data about the target status, we figured out the targets were in broken state as they cannot compute properties
Hence, one needs to execute the below commands from the Agent home where the target database resides

#<AGENT_HOME>/bin/emctl reload agent dynamicproperties -upload_timeout 240 <Name of the problematic database>:rac_database

[oracle@ bin]$ ./emctl reload agent dynamicproperties -upload_timeout 240 xxxPRD:rac_database
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Skipping -upload_timeout, not in required <tgtName>:<tgtType> formSkipping 240, not in required <tgtName>:<tgtType> formEMD recompute dynprops completed successfully

Once can see that it completed successfully. One can also try following

[oracle@ bin]$ ./emctl reload agent dynamicproperties  XXXPRD:rac_database
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD recompute dynprops completed successfully

Now try to collect the data for RESPONSE metric.

[oracle@tryprorarac2f bin]$ ./emctl control agent runCollection xxxPRD:rac_database Response
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD runCollection error:
Response::Couldn't find collection item

So there was an error. I decided to go ahead and see, if that resolves the issue.

3). Follow the below steps from console

a) . make sure test connection in the target database page working fine.

click on "Targets/Databases"
select the database which is showing 'status pending'
click on "configure"
click on "test connection"
Make sure that the message "The connection test was successful" is received.

b). All Repository jobs are working fine
Click on setup
management services and repository
Repository Operations
check that the "Repository Scheduler Jobs Status" jobs are all running on schedule.

Even after doing this, targets were not in monitored state. Meanwhile, researching on MOS, I came across one note suggesting that if there are targets on same node in black out that may also cause issue during collection of data. 
Taking a hint from this I ended black out on all the targets. Next did the upload from agent post clearstate and Bingo ...... I see that targets are now in green i.e. monitored..

Note - Even after this, if the issue still persists, then try to remove the target and add them back manually.  I will work after that. 

Thursday, January 2, 2014



OEM Startup Issue Due to Wrong Repository

Recently our post upgrade to 12c our repository failed to start with following error.

[oracle@ushofsvpracbl3 bin]$ ./emctl start oms
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
Starting Oracle Management Server...
Starting WebTier...
WebTier Successfully Started
Oracle Management Server is not functioning because of the following reason:
Failed to connect to repository database. OMS will be automatically restarted once it identifies that database and listener are up.
WARNING: Limit of open file descriptors is found to be 1024.
For proper functioning of OMS, please set "ulimit -n" to be at least 4096.

Upon looking at , we figured out the DB was down and we started it up. However even after the DB was up the OMS was not coming up. We tried everything from listenr reconfig to reregistration. but all in vain. 

So now we suspect that what DB is needed to connect housing the repository. Following command will give you all those details. 

[oracle@ushofsvpracbl3 bin]$ ./emctl config oms -list_repos_details
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
<Dec 30, 2013 6:54:42 AM CST> <Info> <Security> <BEA-090905> <Disabling CryptoJ JCE Provider self-integrity check for better startup performance. To enable this check, specify -Dweblogic.security.allowCryptoJDefaultJCEVerification=true>

Repository Connect Descriptor : (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=hofsvdorarac2e.intra.searshc.com)(PORT=1525)))(CONNECT_DATA=(SID=OMSUPG)))
Repository User : SYSMAN

This info is fetched from one of the property file aka emoms.properties.

[oracle@ushofsvpracbl3 gc_inst]$ pwd
/u01/app/oracle/Middleware/gc_inst/em/EMGC_OMS1/sysman/ocm/

[oracle@ushofsvpracbl3 gc_inst]$ vi emoms.properties


oracle.sysman.emSDK.svlt.ConsoleServerName=ushofsvpracbl3.searshc.com\:4889_Management_Service
oracle.sysman.eml.mntr.emdRepConnectDescriptor=(DESCRIPTION\=(ADDRESS_LIST\=(ADDRESS\=(PROTOCOL\=TCP)(HOST\=hofsvdorarac2e.intra.searshc.com)(PORT\=1525)))(CONNECT_DATA\=(SID\=OMSUPG)))

Upon looking at this, that we figured out that there are 2 databases with same name across the different clusters!! We starting right database in wrong cluster. So once identified, we started the database correctly and OMS came up nicely. 

[oracle@ushofsvpracbl3 bin]$ ./emctl status oms
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
WebTier is Up
Oracle Management Server is Down

[oracle@ushofsvpracbl3 bin]$ ./emctl start oms
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
Starting Oracle Management Server...
Starting WebTier...
WebTier Successfully Started
Oracle Management Server Already Started
Oracle Management Server is Up
WARNING: Limit of open file descriptors is found to be 1024.
The OMS has been started but it may run out of descriptors under heavy usage.
For proper functioning of OMS, please set "ulimit -n" to be at least 4096.

[oracle@ushofsvpracbl3 bin]$ ./emctl status oms
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
WebTier is Up
Oracle Management Server is Up


Tuesday, December 31, 2013


Apply PSU 9 on OEM 11.1.0.1 Base Release


While running OEM 11.1.0.1, we ran into few issues over a period of time. With 12c Cloud control out for a while, we will considering the upgrade of the existing OEM 11g. Looking at the complexity of 12c upgrade and Oracle suggesting 2 cycles of test, production upgrade still a long shot. 
Mean while, the option we had is to consider the application of PSU on top of base release of OEM 11g. 
Looking around we figured out the latest PSU available was PSU 9 aka 11.1.0.1.9 patch . 

Following is the outline should one consider the application of PSU patch. 

If you do not have the latest version of OPatch, then download it from patch# 6880880 for 11.1.0.0.0 release.

Pre-requisite Patch 12620174

This patch is an auto-update patch. That means that any new install of OMS will have it by default, you dont need to apply if manually. Though, for old installations, one has to apply it manually. 

This patch is called as generic patch i.e. it can be applied on top of 11.1 release. The purpose of this patch is to enhance the patching process for OMS. If you remember, earlier patches used to have two post patch SQL files, to introduce the SQL changes. Post application of this patch, you will only need one. This will reduce the time as well as the user errors, As there are many who forgot to apply the second file due similar sounding names. 
Another important thing is that, this patch only affects how the SQL changes rolled out by applying patch, it doesn't affect how java changes are rolled out during patch. 

Steps to apply Patch 12620174

1. Perquisites: Make sure opatch,unzip are in path. Also ensure you are using latest OPatch version.

2. Copy the patch to the server and unzip

$ unzip p12620174_111010_Generic.zip

3. Shut down services running from the ORACLE_HOME.

$ emctl stop oms
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
Stopping WebTier…
WebTier Successfully Stopped
Stopping Oracle Management Server…
Oracle Management Server Successfully Stopped
Oracle Management Server is Down

$ $AGENT_HOME/bin/emctl stop agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
Stopping agent … stopped.

4. Set your current directory to the directory where the patch is located:

$ cd 12620174
$ opatch napply
OPatch succeeded.

5. Connect to rcuJDBCEngine as SYS and execute the following sql file. Please make sure you set $ORACLE_HOME to OMS_HOME before you connect to rcuJDBCEngine.


$ORACLE_HOME/bin/rcuJDBCEngine sys/welcome1@myhost.myorg.com:1521:sid JDBC_SCRIPT 10154264/patch_10154264.sql $PWD $ORACLE_HOME


Completed SQL script execution normally.
1 scripts were processed

6. Start OMS using the following command. In case of multi-OMS environment, start on all OMS machines

$ emctl start oms
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
Starting WebTier…
WebTier Successfully Started
Starting Oracle Management Server…
Oracle Management Server Successfully Started
Oracle Management Server is Up

Steps to apply 16572176

This is the actual PSU that will roll out the changes to OMS (java+SQL)

1. Perquisites: Make sure opatch,unzip are in path. Also ensure you are using latest OPatch version.
Set ORACLE_HOME to OMS home

2. Ensure that the PSU does not conflict with the already-installed one-off patches. To do so, run the following command to generate a report that lists all conflicting patches.

$ opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir ./16572176
Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation. All rights reserved.

PREREQ session
Oracle Home : /u01/app/oracle/Middleware/oms11g
Central Inventory : /u01/app/oraInventory
from : /etc/oraInst.loc
OPatch version : 11.2.0.1.1
OUI version : 11.1.0.8.0
OUI location : /u01/app/oracle/Middleware/oms11g/oui
Log file location : /u01/app/oracle/Middleware/oms11g/cfgtoollogs/opatch/opatch2011-08-04_11-06-55AM.log

Patch history file: /u01/app/oracle/Middleware/oms11g/cfgtoollogs/opatch/opatch_history.txt
OPatch detects the Middleware Home as “/u01/app/oracle/Middleware”
Invoking prereq “checkconflictagainstohwithdetail”
Prereq “checkConflictAgainstOHWithDetail” passed.
OPatch succeeded.

Note: If you do see any conflicting patches refer README.txt

3. Stop all om services

$<ORACLE_HOME>/bin/emctl stop oms -all
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
Stopping WebTier…
WebTier Successfully Stopped
Stopping Oracle Management Server…
Oracle Management Server Successfully Stopped
Oracle Management Server is Down

4. Download and Unzip the p16572176_111010_Generic.zip and cd to 16572176

$cd 16572176
$ opatch apply

Invoking OPatch 11.2.0.1.1
Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation. All rights reserved.
OPatch succeeded.

5. Connect to rcuJDBCEngine as SYSMAN and run the apply.sql script as follows:

$ /u01/app/oracle/Middleware/oms11g/bin/rcuJDBCEngine sysman@Host:1521:GCREPO JDBC_SCRIPT apply.sql $PWD $ORACLE_HOME

###### SQL Patching operation has started. The Pre-requisites check ######
###### may take upto 3 minutes. Please do not cancel the operation. ######
###### Refer to My Oracle Support note 1326515.1 for more information ######
----------------
Completed SQL script execution normally.
41 scripts were processed

6. Start OMS

$ emctl start oms
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
Starting WebTier…
WebTier Successfully Started
Starting Oracle Management Server…
Oracle Management Server Successfully Started
Oracle Management Server is Up


If you have installed emcli, then run the following command on all the emcli installations:
$ emcli sync

Once the PSU applied on OMS home, you may need to apply patch on Agent Home as well. At the time of writing of this article, the PSU for agent home is not yet available. The latest patch is available for Agent Home is PSU 7.

Patch 9346282 - 11.1.0.1.7 Patch Set Update for Oracle Management Agent. Pls refer following table (as publish in Doc ID - 1358092.1



11.1.0.1
OMSAGENT UnixAGENT Windows
PSU1 (11.1.0.1.1)10065631--
PSU2 (11.1.0.1.2)102700731027360710273607
PSU3 (11.1.0.1.3)11727299934590611778791
PSU4 (11.1.0.1.4)12423703934591312423714
PSU5 (11.1.0.1.5)12833678934592112833724
PSU6  (11.1.0.1.6)13248190934624313248202
PSU7 (11.1.0.1.7)13711705934628213711732
PSU8 (11.1.0.1.8)14766609--
PSU9 (11.1.0.1.9)16572176--
PSU10 (11.1.0.1.10)17154155--

Wednesday, November 27, 2013

Troubleshooting ASM Startup Issue in 11g Grid Infrastructure



In 11g when one places the ASM spfile, OCR or VoteDisk on ASM diskgroup, thus exists a tight interlink between these components. ASM plays very important role when starting grid infrastructure stack, let it single node or multi node i.e. RAC. 
With Single Node Grid Infrastructure, the database using ASM instance has to register it self with CSS. 

So when one cannot start ASM for some reason, the question will be how to trouble shoot the issue since everything is tightly integrated to each other. So first lets look at ASM startup sequence. 

When Grid Infrastructure starts,  Oracle will try to locate the ASM parameter file, to start CSSDAgent in following sequence. 


  • First it will look into GPNP Profile to find the parameter with name "asmdiskstring" 
    • Profile is usually located under <GRID_HOME>/gpnp/profiles/peer
    • file - profile.xml
<orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+DATA/<host>/asmparameterfile  /registry.253.768413123"/> .

Here the issue can be if the above mentioned file is not found then ASM will fail to start. So make sure your profile reflects the correct value.
  • If the above step doesn't reflect any value then, the next look up will be done in GRID_HOME/dbs folder and if located the file will start using pfile found. 
 Again, the file has to be present here and if not the ASM will fail to start as it cannot locate the parameter file. So it will be advisable to have both spfile and pfile to save some pain later. 

There is a caveat here. what if  the gpnp profile reflect the the proper file which exists on ASM but cannot be opened due to corruption. In this case again ASM will fail to start.
Solution to these issue will be to start ASM with transient parameter file as follows ( this is for 2 node RAC )

Use Case - On 2 node RAC only one node is healthy and another node is having problem with CRS stack start up. 

1- Create a new ASM pfile

ora_+ASM1.ora
+ASM1.asm_diskgroups='DATA','FRA'#Manual Mount
+ASM2.asm_diskgroups='DATA','FRA'#Manual Mount
*.asm_diskstring='/dev/oracleasm/disks/*'
*.asm_power_limit=5
*.diagnostic_dest='//u01/app/11.2.0.3/grid/log'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'
2- Start up the ASM instance

on the first node

$ export $ORACLE_SID=<asm instance name>
$ export $ORACLE_HOME=<full path of the asm home>

$ sqlplus / as sysdba
sql> startup pfile=<the full pathname of ora_+ASM1.ora>

This will start ASM Instance on node 1.

3) Recreate the spfile


SQL> create spfile='+DATA' from pfile='/u01/app/11.2.0.3/grid/dbs/init+ASM1.ora';
File created.
SQL> sho parameter spfile;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DATA/<host>/asmparameterfile
                                                      /registry.253.768413123                             
Note - Pls make sure that you check your gpnp profile reflecting its new value. 
4) Restarted the OHAS/CRS stack on node

Connect as root user:


# /u01/app/11.2.0.3/grid/bin/crsctl stop crs
Make sure that all the processes are exited and then start crs
# /u01/app/11.2.0.3/grid/bin/crsctl start crs

After some time do the health check on crs stack that it is started successfully. 

# /u01/app/11.2.0.3/grid/bin/crsctl check cluster -all

Thursday, November 21, 2013


11g Grid Start-up Issue Due to Missing Permissions



Recently one of our 11g cluster went down on multiple nodes. Upon checking we figured out the issue was with permissions as the owner of the GRID Home changed from "grid" user to "oracle" user. 

Diagnosis - 


grid@/u01/app/11.2.0.3/grid/cdata/>ls -ltr

total 2888

drwxr-xr-x 2 oracle oinstall      4096 Mar 11  2012 localhost
drwxr-xr-x 2 oracle oinstall      4096 Mar 11  2012 hostxxx
drwxrwxr-x 2 oracle oinstall      4096 Nov 15 21:56 devorclrac
-rw------- 1 oracle oinstall     272756736 Nov 21 05:30 hostxxx.olr

So the solution to fix this issue is to re-link Grid Home binaries.
Following is the process to do that.

grid@/u01/app/11.2.0.3/grid/crs/install/>./rootcrs.pl -unlock -crshome /u01/app/11.2.0.3/grid
You must be logged in as root to run this script.
Log in as root and rerun this script.
2013-11-21 05:48:27: Not running as authorized user
Insufficient privileges to execute this script.
root or administrative privileges needed to run the script.

[root@hostxxx~]# cd /u01/app/11.2.0.3/grid/crs/install/
[root@hostxxxinstall]# ./rootcrs.pl -unlock -crshome /u01/app/11.2.0.3/grid
Using configuration parameter file: ./crsconfig_params
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hostxxx'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hostxxx'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hostxxx'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hostxxx' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'hostxxx' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'hostxxx' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully unlock /u01/app/11.2.0.3/grid

However during relink as Grid user, hit following error  -

grid@/u01/app/11.2.0.3/grid/bin/>./relink
./relink: line 164: /u01/app/11.2.0.3/grid/install/current_makeorder.xml: Permission denied
writing relink log to: /u01/app/11.2.0.3/grid/install/relink.log
./relink: line 181: /u01/app/11.2.0.3/grid/install/relink.log: Permission denied
grid@/u01/app/11.2.0.3/grid/bin/>ls -ltr /u01/app/11.2.0.3/grid/install/current_makeorder.xml
ls: /u01/app/11.2.0.3/grid/install/current_makeorder.xml: No such file or directory

So the relink failed again with permissions issue. The reason being is that there are lots of binaries/executable under GRID home which are still owned by Oracle user. hence you need to change that.

Relink Log -
oracle.xml.parser.v2.XMLParseException: Start of root element expected.

The above error in relink log is completely mis-leading. so ignore it.

So following are the steps to fix the issue. 

Step 1 - Make sure no Grid processes are running / force stop
/u01/app/11.2.0.3/grid/bin/crsctl stop crs -f

Step 2 - Changed the permissions of GRID HOME to grid:oinstall  ( just to make relink work )

[root@hostxxx11.2.0.3]# chown -R grid:oinstall grid

Step 3 - relink grid home binaries ( Make sure variable ORACLE_HOME is set to Grid Home and you are running this as grid unix user )

As the Oracle Grid Infrastructure for a Cluster owner: 

grid@/home/grid/>/u01/app/11.2.0.3/grid/bin/relink
writing relink log to: /u01/app/11.2.0.3/grid/install/relink.log

As root again: 

# cd $Grid_home/rdbms/install/
# ./rootadd_rdbms.sh 


Step 4 - Load the ASMLib driver ( basically start the init service if not already started )

/etc/init.d/oracleasm status

If down,

[root@hostxxx~]# /etc/init.d/oracleasm start
Initializing the Oracle ASMLib driver:                     [  OK  ]
Scanning the system for Oracle ASMLib disks:     [  OK  ]

Make sure ASM can see the devices -

[root@hostxxx~]# /etc/init.d/oracleasm listdisks

Step 5 - Make sure no Grid processes are running / force stop

/u01/app/11.2.0.3/grid/bin/crsctl stop crs -f


Step 6 - Lock Grid Home...and that will also start the CRS stack

[root@hostxxx~]# /u01/app/11.2.0.3/grid/crs/install/rootcrs.pl -patch
Using configuration parameter file: /u01/app/11.2.0.3/grid/crs/install/crsconfig_params
CRS-4123: Oracle High Availability Services has been started.

Step 7 – Perform the Health Check on entire cluster

crsctl status resource –t –init
crsctl status resource –t

crsctl check has
crsctl check cluster
crsctl check crs
crsctl check cluster -all

Now if you check your cluster will be started okay... 

Tuesday, September 17, 2013

ADRCI Purge Command Fails With DIA-48322




When running adrci from Grid home for removal or purge of the old data when we ran purge command, we hit following. 

adrci> purge
DIA-48322: Relation [INCIDENT] of ADR V[2] incompatible with V[2] tool
DIA-48210: Relation Not Found
DIA-48166: error with opening ADR block file because file does not exist 


This happened when trying to purge the listener Home. 

The reason being is that this home does not contain any activity to be purged:

adrci> show incident

0 rows fetched

However this is equally applicable to other homes as well and only restricted to listener home. 


These errors are raised when you run a purge command in the listener/other ADR Home without incidents to purge. So if may seem like bug but actually not a bug as this is an expected behavior in this situation as there is nothing to purge.