Sunday, September 9, 2012

11.2.0.2 RootUpgrade.sh fails with ASM Bug 

After installing the 11.2.0.2 stack for upgrade on the RAC nodes when I tried running the rootupgrade.sh on first node, I hit the following error 

CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0CRS-2674: Start of 'ora.asm' on 'apps_rac01' failed

[root@apps_rac01 grid]# ./rootupgrade.sh
Running Oracle 11g root script...
The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u03/app/11.2.0.2/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u03/app/11.2.0.2/grid/crs/install/crsconfig_params
ASM upgrade has started on first node.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'apps_rac01'
CRS-2673: Attempting to stop 'ora.crsd' on 'apps_rac01'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'apps_rac01'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'apps_rac01'
CRS-2673: Attempting to stop 'ora.registry.acfs' on 'apps_rac01'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'apps_rac01'
...........................
CRS-2677: Stop of 'ora.diskmon' on 'apps_rac01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'apps_rac01' has completed
CRS-4133: Oracle High Availability Services has been stopped.

Successfully deleted 1 keys from OCR.

Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
Start of resource "ora.asm" failed
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'apps_rac01'
CRS-2676: Start of 'ora.drivers.acfs' on 'apps_rac01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'apps_rac01'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0CRS-2674: Start of 'ora.asm' on 'apps_rac01' failed
CRS-2679: Attempting to clean 'ora.asm' on 'apps_rac01'
CRS-2681: Clean of 'ora.asm' on 'apps_rac01' succeeded
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'apps_rac01'
CRS-2677: Stop of 'ora.drivers.acfs' on 'apps_rac01' succeeded
CRS-4000: Command Start failed, or completed with errors.
Failed to start Oracle Clusterware stack
Failed to start ASM at /u03/app/11.2.0.2/grid/crs/install/crsconfig_lib.pm line 1051.
/u03/app/11.2.0.2/grid/perl/bin/perl -I/u03/app/11.2.0.2/grid/perl/lib -I/u03/app/11.2.0.2/grid/crs/install /u03/app/11.2.0.2/grid/crs/install/rootcrs.pl execution failed

Errors from ASM logfile - 

Exception [type: SIGIOT, unknown code] [ADDR:0x1D34] [PC:0x71F402, __kernel_vsyscall()+2] [exception issued by pid: 7476, uid: 500] [flags: 0x0, count: 1]
Errors in file /u02/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lmd0_7476.trc  (incident=6481):
ORA-07445: exception encountered: core dump [__kernel_vsyscall()+2] [SIGIOT] [ADDR:0x1D34] [PC:0x71F402] [unknown code] []
Incident details in: /u02/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_6481/+ASM1_lmd0_7476_i6481.trc
PMON (ospid: 7458): terminating the instance due to error 482
Mon Aug 27 14:34:42 2012
ORA-1092 : opitsk aborting process

This happens due the bug with OCR when placed on ASM. The workaround is simple as follows. Remember one has to implement this on both nodes after running rootuprade.sh on the respective nodes.

Fix -

[root@apps_rac02 ~]# sudo su - oracle
[oracle@apps_rac02 ~]$ . oraenv
ORACLE_SID = [oracle] ? +ASM2
The Oracle base for ORACLE_HOME=/u03/app/11.2.0.2/grid is /u01/app/oracle
[oracle@apps_rac02 ~]$ sqlplus / as sysasm
Connected to an idle instance.
SQL> startup
ASM instance started
Total System Global Area  284565504 bytes
Fixed Size                  1343692 bytes
Variable Size             258055988 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
SQL> select * from v$instance;
INSTANCE_NUMBER INSTANCE_NAME
--------------- ----------------
HOST_NAME
----------------------------------------------------------------
VERSION           STARTUP_T STATUS       PAR    THREAD# ARCHIVE LOG_SWITCH_WAIT
----------------- --------- ------------ --- ---------- ------- ---------------
LOGINS     SHU DATABASE_STATUS   INSTANCE_ROLE      ACTIVE_ST BLO
---------- --- ----------------- ------------------ --------- ---
              2 +ASM2
apps_rac02.localdomain
11.2.0.2.0        28-AUG-12 STARTED      YES          0 STOPPED
ALLOWED    NO  ACTIVE            UNKNOWN            NORMAL    NO

SQL> select name,state from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
DATA                           MOUNTED
[root@apps_rac02 grid]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'apps_rac02'
CRS-4133: Oracle High Availability Services has been stopped.
[root@apps_rac02 grid]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

[root@apps_rac02 grid]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Ref - 
Clustered ASM Instances Failed to Start During 'rootupgrade.sh' While Upgrading from 11.2.0.1 to 11.2.0.2 [ID 1437325.1]

Node 2 - 
[root@apps_rac02 ~]# cd /u03/app/11.2.0.2/grid/
[root@apps_rac02 grid]# ./rootupgrade.sh 
Running Oracle 11g root script...
The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u03/app/11.2.0.2/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]: 
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n) 
[n]: 
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n) 
[n]: 
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u03/app/11.2.0.2/grid/crs/install/crsconfig_params
Creating trace directory
Failed to add (property/value):('OLD_OCR_ID/'-1') for checkpoint:ROOTCRS_OLDHOMEINFO.Error code is 256
// This is not an error but mere a warning //
ASM upgrade has started on first node.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'apps_rac02'
CRS-2673: Attempting to stop 'ora.crsd' on 'apps_rac02'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'apps_rac02'
......................................
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'apps_rac02' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deleted 1 keys from OCR.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.

Once this is done finish the rest of the upgrade process by clicking OK on OUI prompt box and wait for the message to complete the upgrade test.
Check the cluster status at the end of the upgrade on all nodes.

[oracle@apps_rac02 bin]$ crsctl check cluster -all
**************************************************************
apps_rac01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
apps_rac02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************