12.1.0.2 Bug When Starting Grid Infra Management Repository DB
During start up 12.1.0.2 Grid Infrastructure Management Repository, we came across the following error.
Startup failed with following error..
[oracle@dbrac1 bin]$ srvctl start mgmtdb
PRCR-1079 : Failed to start resource ora.mgmtdb
CRS-5017: The resource action "ora.mgmtdb start" encountered the following error:
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/dbrac2/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.mgmtdb' on 'dbrac2' failed
CRS-5017: The resource action "ora.mgmtdb start" encountered the following error:
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/dbrac1/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.mgmtdb' on 'dbrac1' failed
CRS-2632: There are no more servers to try to place resource 'ora.mgmtdb' on that would satisfy its placement policy
Upon checking alert logs we noticed the corruption, albeit logical...
Beginning crash recovery of 1 threads
Tue Jan 13 21:22:21 2015
Started redo scan
Hex dump of (file 4, block 3301) in trace file /u01/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/-MGMTDB_ora_22748.trc
Reading datafile '+DATA/_MGMTDB/DATAFILE/undotbs1.259.860525535' for corruption at rdba: 0x01000ce5 (file 4, block 3301)
Reread (file 4, block 3301) found same corrupt data (logically corrupt)
Write verification failed for File 4 Block 3301 (rdba 0x1000ce5)
Tue Jan 13 21:22:36 2015
Errors in file /u01/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/-MGMTDB_m000_22997.trc:
ORA-00338: log 2 of thread 1 is more recent than control file
ORA-00312: online log 2 thread 1: '+DATA/_MGMTDB/ONLINELOG/group_2.262.860525589'
Errors in file /u01/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/-MGMTDB_ora_22748.trc (incident=41020) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/incident/incdir_41020/-MGMTDB_ora_22748_i41020.trc
Tue Jan 13 21:22:49 2015
Dumping diagnostic data in directory=[cdmp_20150113212249], requested by (instance=1, osid=22748), summary=[incident=41020].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Jan 13 21:22:51 2015
Aborting crash recovery due to error 600
Tue Jan 13 21:22:51 2015
Errors in file /u01/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/-MGMTDB_ora_22748.trc:
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
However, our management listener was running fine.
Listener MGMTLSNR is enabled
Listener MGMTLSNR is running on node(s): dbrac1
Upon checking MOS, we figure this is a partially published bug which caused this behaviour. The solution to overcome to this is as follows.
export ORACLE_SID= -MGMTDB
SQL> startup mount
SQL> recover database;
SQL> alter database open ;
Upon startup we hit the following error
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 29109
Session ID: 1 Serial number: 24261
Alert Logs shows...
ORA-00312: online log 2 thread 1: '+DATA/_MGMTDB/ONLINELOG/group_2.262.860525589'
This means the SCN of control file is older than SCN of logfile. This happens when you do recovery of old controlfile, which was not the case here. So something does not seem right. I also tried to start the DB on second node of cluster and again it failed.
I also tried to start MGMTCA tool, to reconfigure the management repository but it failed since my GIRM DB was not running.
There may be possibility of the wrong control file being used, but again that was ruled out. So ultimately, what I have to do is to simulate the cancel based recovery and then try to start the db with resetlogs option.
SQL> startup mount
ORACLE instance started.
Total System Global Area 788529152 bytes
Fixed Size 2929352 bytes
Variable Size 314576184 bytes
Database Buffers 465567744 bytes
Redo Buffers 5455872 bytes
Database mounted.
SQL> recover database until cancel;
Media recovery complete.
SQL> alter database open resetlogs;
Database altered.
And as you can this approach worked.
oracle 3320 1 0 20:30 ? 00:00:00 asm_pmon_+ASM1
oracle 3508 1 0 22:10 ? 00:00:00 mdb_pmon_-MGMTDB
oracle 4712 18509 0 22:13 pts/1 00:00:00 grep pmon
oracle 6378 1 0 20:34 ? 00:00:00 ora_pmon_orcl1
[oracle@dbrac1 bin]$ srvctl start mgmtdb
[oracle@dbrac1 bin]$ srvctl status mgmtdb
Database is enabled
Instance -MGMTDB is running on node dbrac1
Hope this helps...