Thursday, November 21, 2013


11g Grid Start-up Issue Due to Missing Permissions



Recently one of our 11g cluster went down on multiple nodes. Upon checking we figured out the issue was with permissions as the owner of the GRID Home changed from "grid" user to "oracle" user. 

Diagnosis - 


grid@/u01/app/11.2.0.3/grid/cdata/>ls -ltr

total 2888

drwxr-xr-x 2 oracle oinstall      4096 Mar 11  2012 localhost
drwxr-xr-x 2 oracle oinstall      4096 Mar 11  2012 hostxxx
drwxrwxr-x 2 oracle oinstall      4096 Nov 15 21:56 devorclrac
-rw------- 1 oracle oinstall     272756736 Nov 21 05:30 hostxxx.olr

So the solution to fix this issue is to re-link Grid Home binaries.
Following is the process to do that.

grid@/u01/app/11.2.0.3/grid/crs/install/>./rootcrs.pl -unlock -crshome /u01/app/11.2.0.3/grid
You must be logged in as root to run this script.
Log in as root and rerun this script.
2013-11-21 05:48:27: Not running as authorized user
Insufficient privileges to execute this script.
root or administrative privileges needed to run the script.

[root@hostxxx~]# cd /u01/app/11.2.0.3/grid/crs/install/
[root@hostxxxinstall]# ./rootcrs.pl -unlock -crshome /u01/app/11.2.0.3/grid
Using configuration parameter file: ./crsconfig_params
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hostxxx'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hostxxx'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hostxxx'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hostxxx' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'hostxxx' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'hostxxx' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully unlock /u01/app/11.2.0.3/grid

However during relink as Grid user, hit following error  -

grid@/u01/app/11.2.0.3/grid/bin/>./relink
./relink: line 164: /u01/app/11.2.0.3/grid/install/current_makeorder.xml: Permission denied
writing relink log to: /u01/app/11.2.0.3/grid/install/relink.log
./relink: line 181: /u01/app/11.2.0.3/grid/install/relink.log: Permission denied
grid@/u01/app/11.2.0.3/grid/bin/>ls -ltr /u01/app/11.2.0.3/grid/install/current_makeorder.xml
ls: /u01/app/11.2.0.3/grid/install/current_makeorder.xml: No such file or directory

So the relink failed again with permissions issue. The reason being is that there are lots of binaries/executable under GRID home which are still owned by Oracle user. hence you need to change that.

Relink Log -
oracle.xml.parser.v2.XMLParseException: Start of root element expected.

The above error in relink log is completely mis-leading. so ignore it.

So following are the steps to fix the issue. 

Step 1 - Make sure no Grid processes are running / force stop
/u01/app/11.2.0.3/grid/bin/crsctl stop crs -f

Step 2 - Changed the permissions of GRID HOME to grid:oinstall  ( just to make relink work )

[root@hostxxx11.2.0.3]# chown -R grid:oinstall grid

Step 3 - relink grid home binaries ( Make sure variable ORACLE_HOME is set to Grid Home and you are running this as grid unix user )

As the Oracle Grid Infrastructure for a Cluster owner: 

grid@/home/grid/>/u01/app/11.2.0.3/grid/bin/relink
writing relink log to: /u01/app/11.2.0.3/grid/install/relink.log

As root again: 

# cd $Grid_home/rdbms/install/
# ./rootadd_rdbms.sh 


Step 4 - Load the ASMLib driver ( basically start the init service if not already started )

/etc/init.d/oracleasm status

If down,

[root@hostxxx~]# /etc/init.d/oracleasm start
Initializing the Oracle ASMLib driver:                     [  OK  ]
Scanning the system for Oracle ASMLib disks:     [  OK  ]

Make sure ASM can see the devices -

[root@hostxxx~]# /etc/init.d/oracleasm listdisks

Step 5 - Make sure no Grid processes are running / force stop

/u01/app/11.2.0.3/grid/bin/crsctl stop crs -f


Step 6 - Lock Grid Home...and that will also start the CRS stack

[root@hostxxx~]# /u01/app/11.2.0.3/grid/crs/install/rootcrs.pl -patch
Using configuration parameter file: /u01/app/11.2.0.3/grid/crs/install/crsconfig_params
CRS-4123: Oracle High Availability Services has been started.

Step 7 – Perform the Health Check on entire cluster

crsctl status resource –t –init
crsctl status resource –t

crsctl check has
crsctl check cluster
crsctl check crs
crsctl check cluster -all

Now if you check your cluster will be started okay... 

No comments:

Post a Comment