Monday, November 5, 2012

How To Restore From Backed-up Grid Binaries After Failed Upgrade

During one of our patching process (from 11.2.0.1 to 11.2.0.2) for 11g R2 DB, our upgrade for Gird Binaries failed due to some unknown reason. Since we are running out of time we didn't had much choice but to resort to our backed up binaries. The good thing, I guess, is that it failed on the first node. So the other nodes were untouched and we have to backed out only the first node. And also, its time to get our processes tested.

Error - 
CRS-2675: Stop of 'ora.oc4j' on 'appsractest' failed
CRS-2679: Attempting to clean 'ora.oc4j' on 'appsractest'
CRS-2678: 'ora.oc4j' on 'appsractest' has experienced an unrecoverable failure
CRS-2677: Stop of 'ora.iedge.db' on 'appsractest' succeeded
CRS-2673: Attempting to stop 'ora.DATA1.dg' on 'appsractest'
ORA-15154: cluster rolling upgrade incomplete
ORA-15154: cluster rolling upgrade incomplete
CRS-2675: Stop of 'ora.DATA1.dg' on 'appsractest' failed
CRS-2679: Attempting to clean 'ora.DATA1.dg' on 'appsractest'
ORA-15154: cluster rolling upgrade incomplete
ORA-15154: cluster rolling upgrade incomplete
CRS-2678: 'ora.DATA1.dg' on 'appsractest' has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
CRS-2673: Attempting to stop 'ora.asm' on 'appsractest'
CRS-2677: Stop of 'ora.asm' on 'appsractest' succeeded
CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'appsractest' has failed
CRS-2675: Stop of 'ora.crsd' on 'appsractest' failed
CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'appsractest' has failed
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.
################################################################
#You must kill processes or reboot the system to properly      #
#cleanup the processes started by Oracle Grid Infrastructure   #
################################################################
Failed to stop old Grid Infrastructure stack at /u01/app/11.2.0.3/grid/crs/install/crsconfig_lib.pm line 14109.
/u01/app/11.2.0.3/grid/perl/bin/perl -I/u01/app/11.2.0.3/grid/perl/lib -I/u01/app/11.2.0.3/grid/crs/install /u01/app/11.2.0.3/grid/crs/install/rootcrs.pl execution failed


-- Before upgrade following pieces were backed up.
1. Grid Home - /u01/app/11.2.0/grid
2. RAC Home - /u01/app/oracle/product/11.2.0/db_1
3. System Files - /etc/oracle, /etc/oratab & /etc/inittab
4. Init Script - /etc/init.d/init*
5. OCR Backup - /u01/app/11.2.0/grid/cdata/appsrac/OCR_Prior_backup

Backup Commands - 
1. Grid Home - 
[root@appsractest oracle]# tar -cvpf node1_grid.tar /u01/app/11.2.0/grid
2. RAC Home -  
[root@appsractest oracle]# tar - cvpf node1_rac.tar /u01/app/oracle/product/11.2.0/db_1
3. System Files  - 
[root@appsractest oracle]#  tar -cvpf etc_oracle.tar /etc/oracle/*; tar -cvpf etc_inittab.tar /etc/inittab; tar -cvpf etc_oratab.tar /etc/oratab
4. Init Script  - 
[root@appsractest oracle]#  tar -cvpf etc_initd.tar /etc/init.d/init*
5. OCR Backup  - 
[root@appsractest oracle]# ocrconfig -manualbackup
-- This will take the backup in $GRID_HOME/cdata/cluster-name directory, rename it to your chosen name.

Since your upgrade has failed you need to stop the any running processes from cluster-ware stack. Find out the process and kill them if you cant stop them.
Once all the processes are exited, now its the time to restore binaries. Following are the commands to restore them.

1. Grid Home - Following command will Untar the directory structure in your current working directory. Once completed, move the files to specific folder. Dont forget to backup the failed Grid Home Directory.
[root@appsractest oracle]# tar -xvpf node1_grid.tar

2. RAC Home -  Following command will Untar the directory structure in your current working directory. Once completed, move the files to specific folder. Dont forget to backup the failed Grid Home Directory.
[root@appsractest oracle]# tar - xvpf node1_rac.tar

3. System Files  - 
[root@appsractest oracle]#  tar -xvpf etc_oracle.tar ; tar -xvpf etc_inittab.tar ; tar -xvpf etc_oratab.tar 
4. Init Script  - 
[root@appsractest oracle]#  tar -xvpf etc_initd.tar
5. OCR Backup  - If you also need to restore the backup of the OCR, first you have to perform all the above four steps and make sure your binaries are restored properly. Once done you can check the integrity of the OCR and if needed to restore, you can check following link to review the restore process for OCR.
<http://handsonoracle.blogspot.in/2012/11/how-to-restore-asm-based-ocr-after-loss.html>

Now you can either reboot or restart the HAS stack as follows.
 [root@appsractest oracle]# crsctl start has 
OR 
[root@appsractest oracle]# reboot

Now if the restore is done correctly, your stack will come up. Once up its time to check whether the activeversion and softwareversion are being reflecting correctly. Both of the following commands should be reflecting same binaries version.
[root@appsractest oracle]# crsctl check crs activeversion
[root@appsractest oracle]# crsctl check crs softwareversion