OEM Targets are in Pending State, Not Monitored
This was really wierd problem that we faced. Usually when we configure the new target in Grid Monitoring Console, it automatically detects the status post first upload from agent. Sometime it also takes more time to get the correct status reflected.It might also happen that one has to clear the previous state of the agent and discard the old data, which yet to uploaded to OEM repository, due to either host/agent crash or comunication link failure. One needs to shut down the agent and issue following command to get it in working order.
#<AGENT_HOME>/bin/emctl stop agent
#<AGENT_HOME>/bin/emctl clearstate agent
#<AGENT_HOME>/bin/emctl start agent
However, in this particular case it was giving us all sorts of headache. We tried following solutions to get it in working state and none of them worked.
1. tried removing and readding targets back
2. Agent resync
3. agent clearstate
4. Adding targets with new name
5. Agent stop, wait for 10 mins and start again (This is work around suggested for bug faced in 11g OEM)
One of the solution could be completely remove the agent, reinstall and re-configure the targets, which is obviously long and tedious process. Also there is a risk of leaving production environment without monitoring, which every change management and review guys find difficult to digest.
This happens to one of node in 8 node cluster. So the instance running on other nodes are detected and showing correct state, but on one node it always shows in pending state. Also on this particular node there are some targets in pre-prod env, hence under black out to avoid false alarms.
So in order to identify the root cause for this, first we need to understand the reason.
We checked the following table under sysman schema in repository database.
select TARGET_NAME,owner,discovered_name,display_name,dynamic_property_status,broken_reason,broken_str from mgmt_targets where target_type='oracle_database' and target_name=''xxxx'
If you don't have the access on OEM repository database then, you can use following workaround, to get the data..
http(s)://host.domain:port/emd/browser/main
Another query one can use is as follows...
[oracle@/oracle/ent11gr2/agent11g/bin]: ./emctl status agent target xxxPRD,oracle_database
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Target Name : xxxPRD
Target Type : oracle_database
Current severity state
----------------------
Metric Column name Key State Timestamp
--------------------------------------------------------------------------------
DeferredTrans errortrans_count n/a CLEAR 2014-01-21 03:30:04
DeferredTrans deftrans_count n/a CLEAR 2014-01-21 03:30:04
UserLocks maxBlockedSess UL CLEAR 2014-01-21 03:29:51
UserLocks maxBlockedDBTime UL CLEAR 2014-01-21 03:29:51
UserLocks maxBlockedSess TX CLEAR 2014-01-21 03:29:51
UserLocks maxBlockedDBTime TX CLEAR 2014-01-21 03:29:51
UserLocks maxBlockedSess TM CLEAR 2014-01-21 03:29:51
UserLocks maxBlockedDBTime TM CLEAR 2014-01-21 03:29:51
Response Status n/a CLEAR 2014-01-21 03:29:39
Response State n/a CLEAR 2014-01-21 03:29:39
health_check Status n/a CLEAR 2014-01-21 03:32:12
health_check Unmounted n/a CLEAR 2014-01-21 03:32:12
Once we have data about the target status, we figured out the targets were in broken state as they cannot compute properties
Hence, one needs to execute the below commands from the Agent home where the target database resides
#<AGENT_HOME>/bin/emctl reload agent dynamicproperties -upload_timeout 240 <Name of the problematic database>:rac_database
[oracle@ bin]$ ./emctl reload agent dynamicproperties -upload_timeout 240 xxxPRD:rac_database
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Skipping -upload_timeout, not in required <tgtName>:<tgtType> formSkipping 240, not in required <tgtName>:<tgtType> formEMD recompute dynprops completed successfully
Once can see that it completed successfully. One can also try following
[oracle@ bin]$ ./emctl reload agent dynamicproperties XXXPRD:rac_database
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD recompute dynprops completed successfully
Now try to collect the data for RESPONSE metric.
[oracle@tryprorarac2f bin]$ ./emctl control agent runCollection xxxPRD:rac_database Response
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD runCollection error:
Response::Couldn't find collection item
So there was an error. I decided to go ahead and see, if that resolves the issue.
3). Follow the below steps from console
a) . make sure test connection in the target database page working fine.
click on "Targets/Databases"
select the database which is showing 'status pending'
click on "configure"
click on "test connection"
Make sure that the message "The connection test was successful" is received.
b). All Repository jobs are working fine
Click on setup
management services and repository
Repository Operations
check that the "Repository Scheduler Jobs Status" jobs are all running on schedule.
Even after doing this, targets were not in monitored state. Meanwhile, researching on MOS, I came across one note suggesting that if there are targets on same node in black out that may also cause issue during collection of data.
Taking a hint from this I ended black out on all the targets. Next did the upload from agent post clearstate and Bingo ...... I see that targets are now in green i.e. monitored..
Note - Even after this, if the issue still persists, then try to remove the target and add them back manually. I will work after that.
No comments:
Post a Comment