11g Grid Start-up Issue Due to Missing Permissions
Recently one of our 11g
cluster went down on multiple nodes. Upon checking we figured out the issue was
with permissions as the owner of the GRID Home changed from "grid"
user to "oracle" user.
Diagnosis -
grid@/u01/app/11.2.0.3/grid/cdata/>ls -ltr
total 2888
drwxr-xr-x 2 oracle oinstall
4096 Mar 11 2012 localhost
drwxr-xr-x 2 oracle oinstall
4096 Mar 11 2012 hostxxx
drwxrwxr-x 2 oracle oinstall
4096 Nov 15 21:56 devorclrac
-rw------- 1 oracle oinstall
272756736 Nov 21 05:30 hostxxx.olr
So the solution to fix this issue is to re-link Grid Home
binaries.
Following is the process to do that.
grid@/u01/app/11.2.0.3/grid/crs/install/>./rootcrs.pl -unlock
-crshome /u01/app/11.2.0.3/grid
You must be logged in as root to run this script.
Log in as root and rerun this script.
2013-11-21 05:48:27: Not running as authorized user
Insufficient privileges to execute this script.
root or administrative privileges needed to run the script.
[root@hostxxx~]# cd /u01/app/11.2.0.3/grid/crs/install/
[root@hostxxxinstall]# ./rootcrs.pl -unlock -crshome
/u01/app/11.2.0.3/grid
Using configuration parameter file: ./crsconfig_params
CRS-2791: Starting shutdown of Oracle High Availability
Services-managed resources on 'hostxxx'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hostxxx'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hostxxx'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hostxxx' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'hostxxx' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed
resources on 'hostxxx' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully unlock /u01/app/11.2.0.3/grid
However during relink as Grid user, hit following error -
grid@/u01/app/11.2.0.3/grid/bin/>./relink
./relink: line 164:
/u01/app/11.2.0.3/grid/install/current_makeorder.xml: Permission denied
writing relink log to: /u01/app/11.2.0.3/grid/install/relink.log
./relink: line 181: /u01/app/11.2.0.3/grid/install/relink.log:
Permission denied
grid@/u01/app/11.2.0.3/grid/bin/>ls -ltr
/u01/app/11.2.0.3/grid/install/current_makeorder.xml
ls: /u01/app/11.2.0.3/grid/install/current_makeorder.xml: No
such file or directory
So the relink failed again with permissions issue. The reason
being is that there are lots of binaries/executable under GRID home which are
still owned by Oracle user. hence you need to change that.
Relink Log -
oracle.xml.parser.v2.XMLParseException: Start of root element
expected.
The above error in relink log is completely mis-leading. so ignore
it.
So following are the steps to fix the issue.
Step 1 - Make sure no
Grid processes are running / force stop
/u01/app/11.2.0.3/grid/bin/crsctl
stop crs -f
Step 2 - Changed the
permissions of GRID HOME to grid:oinstall ( just to make relink work )
[root@hostxxx11.2.0.3]#
chown -R grid:oinstall grid
Step 3 - relink grid
home binaries ( Make sure variable ORACLE_HOME is set to Grid Home and you are
running this as grid unix user )
As the Oracle Grid Infrastructure for a Cluster owner:
grid@/home/grid/>/u01/app/11.2.0.3/grid/bin/relink
writing relink log to:
/u01/app/11.2.0.3/grid/install/relink.log
As root again:
# cd $Grid_home/rdbms/install/
# ./rootadd_rdbms.sh
Step 4 - Load the
ASMLib driver ( basically start the init service if not already started )
/etc/init.d/oracleasm
status
If down,
[root@hostxxx~]#
/etc/init.d/oracleasm start
Initializing the
Oracle ASMLib
driver:
[ OK ]
Scanning the system
for Oracle ASMLib disks: [ OK ]
Make sure ASM can see
the devices -
[root@hostxxx~]#
/etc/init.d/oracleasm listdisks
Step 5 - Make sure no
Grid processes are running / force stop
/u01/app/11.2.0.3/grid/bin/crsctl
stop crs -f
Step 6 - Lock Grid
Home...and that will also start the CRS stack
[root@hostxxx~]#
/u01/app/11.2.0.3/grid/crs/install/rootcrs.pl -patch
Using configuration
parameter file: /u01/app/11.2.0.3/grid/crs/install/crsconfig_params
CRS-4123: Oracle High
Availability Services has been started.
Step 7 – Perform the
Health Check on entire cluster
crsctl status resource –t
–init
crsctl status resource –t
crsctl check has
crsctl check cluster
crsctl check crs
crsctl check cluster -all
Now if you check your cluster will be started okay...
No comments:
Post a Comment