Thursday, January 9, 2014


OEM Targets are in Pending State, Not Monitored

This was really wierd problem that we faced. Usually when we configure the new target in Grid Monitoring Console, it automatically detects the status post first upload from agent. Sometime it also takes more time to get the correct status reflected. 
It might also happen that one has to clear the previous state of the agent and discard the old data, which yet to uploaded to OEM repository, due to either host/agent crash or comunication link failure. One needs to shut down the agent and issue following command to get it in working order.

#<AGENT_HOME>/bin/emctl stop agent
#<AGENT_HOME>/bin/emctl clearstate agent
#<AGENT_HOME>/bin/emctl start agent

However, in this particular case it was giving us all sorts of headache. We tried following solutions to get it in working state and none of them worked. 

1. tried removing and readding targets back
2. Agent resync
3. agent clearstate
4. Adding targets with new name
5. Agent stop, wait for 10 mins and start again (This is work around suggested for bug faced in 11g OEM)

One of the solution could be completely remove the agent, reinstall and re-configure the targets, which is obviously long and tedious process. Also there is a risk of leaving production environment without monitoring, which every change management and review guys find difficult to digest. 
This happens to one of node in 8 node cluster. So the instance running on other nodes are detected and showing correct state, but on one node it always shows in pending state. Also on this particular node there are some targets in pre-prod env, hence under black out to avoid false alarms. 

So in order to identify the root cause for this, first we need to understand the reason. 
We checked the following table under sysman schema in repository database. 

select TARGET_NAME,owner,discovered_name,display_name,dynamic_property_status,broken_reason,broken_str from mgmt_targets where target_type='oracle_database' and target_name=''xxxx'

If you don't have the access on OEM repository database then, you can use following workaround, to get the data..
http(s)://host.domain:port/emd/browser/main

Another query one can use is as follows...

[oracle@/oracle/ent11gr2/agent11g/bin]: ./emctl status agent target xxxPRD,oracle_database

Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Target Name : xxxPRD
Target Type : oracle_database
Current severity state
----------------------
Metric          Column name             Key             State           Timestamp
--------------------------------------------------------------------------------
DeferredTrans   errortrans_count        n/a             CLEAR           2014-01-21 03:30:04
DeferredTrans   deftrans_count          n/a             CLEAR           2014-01-21 03:30:04
UserLocks       maxBlockedSess          UL              CLEAR           2014-01-21 03:29:51
UserLocks       maxBlockedDBTime        UL              CLEAR           2014-01-21 03:29:51
UserLocks       maxBlockedSess          TX              CLEAR           2014-01-21 03:29:51
UserLocks       maxBlockedDBTime        TX              CLEAR           2014-01-21 03:29:51
UserLocks       maxBlockedSess          TM              CLEAR           2014-01-21 03:29:51
UserLocks       maxBlockedDBTime        TM              CLEAR           2014-01-21 03:29:51
Response        Status                  n/a             CLEAR           2014-01-21 03:29:39
Response        State                   n/a             CLEAR           2014-01-21 03:29:39
health_check    Status                  n/a             CLEAR           2014-01-21 03:32:12
health_check    Unmounted               n/a             CLEAR           2014-01-21 03:32:12


Once we have data about the target status, we figured out the targets were in broken state as they cannot compute properties
Hence, one needs to execute the below commands from the Agent home where the target database resides

#<AGENT_HOME>/bin/emctl reload agent dynamicproperties -upload_timeout 240 <Name of the problematic database>:rac_database

[oracle@ bin]$ ./emctl reload agent dynamicproperties -upload_timeout 240 xxxPRD:rac_database
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Skipping -upload_timeout, not in required <tgtName>:<tgtType> formSkipping 240, not in required <tgtName>:<tgtType> formEMD recompute dynprops completed successfully

Once can see that it completed successfully. One can also try following

[oracle@ bin]$ ./emctl reload agent dynamicproperties  XXXPRD:rac_database
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD recompute dynprops completed successfully

Now try to collect the data for RESPONSE metric.

[oracle@tryprorarac2f bin]$ ./emctl control agent runCollection xxxPRD:rac_database Response
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD runCollection error:
Response::Couldn't find collection item

So there was an error. I decided to go ahead and see, if that resolves the issue.

3). Follow the below steps from console

a) . make sure test connection in the target database page working fine.

click on "Targets/Databases"
select the database which is showing 'status pending'
click on "configure"
click on "test connection"
Make sure that the message "The connection test was successful" is received.

b). All Repository jobs are working fine
Click on setup
management services and repository
Repository Operations
check that the "Repository Scheduler Jobs Status" jobs are all running on schedule.

Even after doing this, targets were not in monitored state. Meanwhile, researching on MOS, I came across one note suggesting that if there are targets on same node in black out that may also cause issue during collection of data. 
Taking a hint from this I ended black out on all the targets. Next did the upload from agent post clearstate and Bingo ...... I see that targets are now in green i.e. monitored..

Note - Even after this, if the issue still persists, then try to remove the target and add them back manually.  I will work after that.