Node Reboot or Shutdown Skipped to Stop 11gR2 Grid
Infrastructure
During maintenance in one of our RAC env , the node was
rebooted without bringing down the grid manually (as GI automatically stops all
its processes automatically when it detects the node shutting down).
Upon start up we had issues of starting up ASM instance
and due to this our CSS and then on CRS was not coming up hence grid was
un-operational.
During research it was revealed that this was due to the
unpublished bug 8740030. Due to this bug, while rebooting a node, command
K19ohasd in /etc, which suppose to stop Grid Infrastructure, will be skipped as
/var/lock/subsys/ohasd* doesn't exist:
# ls -l /var/lock/subsys/ohasd* | wc -l
output: 0
Looking at the logs reveals following...
CSS Logs -
[ CSSD][4105858816]clssscProcessKillShutdown: Initiating shutdown due to process kill
[ CSSD][4105858816]###################################
[ CSSD][1145833792]clssgmSendShutdown: Aborting client (0x2aaaac01c850) proc (0x90f66c0), iocapables 1.
ASM Logs -
ORA-29746: Cluster Synchronization Service is being shut down.
ORA-29702: error occurred in Cluster Group Service operation
GMON (ospid: 6595): terminating the instance due to error 29746
Instance terminated by GMON, pid = 6595
Fix -
To fix the issue, one has to modify /etc/init.d/ohasd
1. From:
Linux)
..
LOGMSG="$LOGGER -puser.err"
LOGERR="$LOGGER -puser.alert"
;;
To:
Linux)
..
LOGMSG="$LOGGER -puser.err"
LOGERR="$LOGGER -puser.alert"
SUBSYSFILE="/var/lock/subsys/ohasd"
;;
2. From:
start()
{
$ECHO -n $"Starting $PROG: "
To:
start()
{
case `/bin/uname` in
Linux)
/bin/touch $SUBSYSFILE
;;
*)
;;
esac
$ECHO -n $"Starting $PROG: "
3. From:
stop()
{
$ECHO -n "Stopping Oracle Clusterware
stack"
..
}
To:
stop()
{
case `/bin/uname` in
Linux)
$RMF $SUBSYSFILE
;;
*)
;;
esac
$ECHO -n "Stopping Oracle Clusterware
stack"
..
}
Once this is done, try to reboot the node again without shutting down the GI and see if it stops gracefully or not. I tested it and it worked fine this time.