elcarOnOsdnaH: May 2013

Changing CRS/Database timezone in 11.2.0.2 post install

Recently we need to modify our prod RAC cluster timezone from CST to EST. This is 8 node cluster hence we got to make sure the changes are reflected on all nodes properly.

First thing is to make sure your OS is reflecting the correct TZ as per your need. Usually this is been taken care by SAs.
So one needs to get confirmation from SAs that OS is ready with new Time zone values. Post that DBAs need to perform following changes on Grid

The timezone info in Grid home is stored in the following file ....
$GRID_HOME/crs/install/s_config_(hostname).txt

#cd /u01/app/grid/11.2.0.2/crs/install

-- file which contains the values is following, where xxxxx is my host name.

# cat s_crsconfig_xxxxxxxx.txt
TZ=UTC
NLS_LANG=AMERICAN_AMERICA.AL32UTF8
TNS_ADMIN=
ORACLE_BASE=

To resolve the issue we need to change TZ to EST on all nodes and restart clusterware. So entry would be like
===
From TZ=UTC ==> TZ=America/New_York

On Restarting clusteware , database and clusteware starts with correct timezone.

Remember - The time zone of the system is defined by the contents of /etc/localtime

ORA-00020 on DB Instance / HUNDREDS OF ORAAGENT.BIN@HOSTNAME SESSSIONS IN 11.2.0.2 DATABASE

The issue happen on one of our production env running 8 node cluster. The DB having issue was running on two nodes of the cluster. One one node, say node A, it was running fine. However, on another node B, it was not allowing connection to DB. Later on we figured out the process parameter seems exhausted and we increased the "PROCESSES" parameter from 100 to 500.

Time went by and again after few weeks we started seeing the same problem. Some thing is not right as ASM instance doesn't eat that many processes. Oracle Best Practices for the ASM instance suggests to keep 100 Processes for medium to high load DB.

Checking went on, one surprising issue that I saw during the event is that there were lots of ORAAGENT processes spawned. Total I saw was around 700+. Not sure why. So we decided to clear everything and start the instance afresh. We shut down the instance and on the startup surprise was waiting for us. ASM Instance was able to start and to our surprise the process from oraagent.bin were still in place.

SQL> startup
ORA-03113: end-of-file on communication channel

Once it happens, other instance may fail to restart as no new connection can be made to the ORA-00020 instance

To double check my suspicion I checked the log files and trace files. And what I found is following.
[grid@xxxx cdmp_20130405075132]$ cat /u01/app/grid/diag/asm/+asm/+ASM7/trace/+ASM7_diag_30046.trc | grep oraagent.bin@tryprorarac2g.intra.searshc.com | wc -l
734

It perfectly fits the figure of 700 odd connections that I saw. Eventually ORA-00020 will happen to DB instance as oraagent.bin keeps making new connections without closing them. Not sure why they were abundance, I checked MOS and I came across following two bug notes and they fits the bill. I also checked that we have not applied PSU3 , in which they say, they have fixed it.

Bug 11877079 : HUNDREDS OF ORAAGENT.BIN@HOSTNAME SESSSIONS IN 11.2.0.2 DATABASE
Bug 10299006 : AFTER 11.2.0.2 UPGRADE, ORAAGENT.BIN CONNECTS TO DATABASE WITH TOO MANY SESSIONS

So, we ultimately decided to reboot the box to kill all the ORAAGENT.BIN processes. Post refresh ASM came up clean and DB started successfully.

Hope this will help you to trouble shoot your ASM bug.

On a side note, this is also fixed in 11.2.0.3 and we recently applied 11.2.0.3 Grid Upgrade to fix this. DBs are still running with 11.2.0.2 Version.

elcarOnOsdnaH

Friday, May 24, 2013

Changing CRS/Database timezone in 11.2.0.2 post install

Monday, May 20, 2013

ORA-00020 on DB Instance / HUNDREDS OF ORAAGENT.BIN@HOSTNAME SESSSIONS IN 11.2.0.2 DATABASE