Wednesday, February 6, 2013

OEM Agent Not Starting Up On Oracle RAC After Restart

 Not sure what went wrong, but after installation of agent on RAC Cluster, the agent was crashing out.

Log shows...
emagent.trc 
FATAL_ERROR::500|ORA-20603: The timezone of the multiagent target
(ODB,rac_database)is not consistent with the timezone (America/Detroit) reported by other agents.
2013-02-04 19:01:49,073 Thread-1099983168 ERROR upload: number of fatal error exceeds the limit 3
2013-02-04 19:01:49,073 Thread-1099983168 ERROR upload: agent will shutdown now

emagent.log 
 Unsuccessful Upload attempts for XML file exceeds specified limit=3, Agent will shutdown (00851)
2013-02-04 18:47:57,213 Thread-1383669120 EMAgent abnormal terminating (00704
)

So far looks like something wrong with Time zone between OMS and agent. But upon reviewing following file, the thought immediately went away.

emdctl.log
 2013-02-05 04:18:11,186 Thread-2402549120 WARN  http: nmehl_connect_internal: connect failed to (hostb:3872): Connection refused (error = 111)
2013-02-05 04:18:11,186 Thread-2402549120 ERROR main: nmectla_agentctl: Error connecting to
https://hostb:3872/emd/main. Returning status code 1
2013-02-05 04:22:05,803 Thread-3880828288 WARN  http: nmehl_connect_internal: connect failed to (hostb:3872): Connection refused (error = 111)
2013-02-05 04:22:05,803 Thread-3880828288 ERROR main: nmectla_agentctl: Error connecting to
https://hostb:3872/emd/main. Returning status code 1
2013-02-05 04:23:06,873 Thread-1536675200 WARN  http: nmehl_connect_internal: connect failed to (hostb:3872): Connection refused (error = 111)
2013-02-05 04:23:06,874 Thread-1536675200 ERROR main: nmectla_agentctl: Error connecting to
https://hostb:3872/emd/main. Returning status code 1

So looks like the root.sh on hostb was not ran!!! So its the time to run it and see how it goes.

Latest after root.sh execution
2013-02-05 06:29:19,293 Thread-1116105024 ERROR pingManager: nmepm_pingReposURL: Cannot connect to https://oms:7799/em/upload/: retStatus=-1
2013-02-05 06:29:19,298 Thread-1116105024 WARN  http: nmehl_readAgentKey: File access failure
2013-02-05 06:29:19,298 Thread-1116105024 ERROR ssl: Open wallet failed, ret = 28759
2013-02-05 06:29:19,298 Thread-1116105024 ERROR ssl: nmehlenv_openWallet failed
2013-02-05 06:29:19,298 Thread-1116105024 ERROR ssl: Error initializing SSL
2013-02-05 06:29:19,298 Thread-1116105024 ERROR http: 11: Unable to initialize ssl connection with server, aborting connection attempt: ret -1

Upon careful review of this, I realized the agent was trying to upload data on 7799 port , which is my management port and upload port is 4900. So thats is there. I need to fix the port to upload it on 4900

On HostA Agent URL         : https://hosta:3872/emd/main
Repository URL    : https://oms:4900/em/upload

On HostB
Agent URL         :
http://hostb:3872/emd/main
Repository URL    : https://oms:7799/em/upload/

So one has to modify the $AGENT_HOME/sysman/config/emd.properties

There is an entry REPOSITORY_URL=https://oms:7799/em/upload

I have to modify this to reflect the port as follows
REPOSITORY_URL=
https://oms:4900/em/upload
Once done I have to clear out the pending upload files...
Files to remove -
$AGENT_HOME/sysman/emd/state/*
$AGENT_HOME/sysman/emd/upload/*

Once done do clearstate agent
$AGENT_HOME/bin/emctl clearstate agent

Try to start the agent now..
$AGENT_HOME/bin/emctl start agent a
nd now, the agent comes back up fine..