About Me

The best place to start learning Oracle, Linux/Unix scripting, and Oracle administration skills. Currently, as Oracle DBA with 11g/10g/9i OCP certified, I want to help you with my background and knowledge. Hopefully, this info would help you as a job aid.

Friday, November 18, 2011

DB console does not find the listener error in 10g Cluster

The DBconsole does not work and with the error of error receive: ORA-12541: TNS:no listener (DBD ERROR: OCIServerAttach). On the home page of the DBconsole, it does not showing the information of the instance nor database. The snapshot is below:

The first thing we check is the listener status on all nodes.
(Since this is a three node RAC cluster, we check the lsnrctl status on all nodes in the cluster)
The output is like below ( the example is only one node, since it's the same on all other nodes without different node names):

racstress6:/opt/oracle> lsnrctl status listener_racstr23
LSNRCTL for Linux: Version 10.2.0.4.0 - Production on 18-NOV-2011 14:28:05
Copyright (c) 1991, 2007, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=racstress6-vip.dm.cdm.com)(PORT=1526)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_RACSTR23
Version                   TNSLSNR for Linux: Version 10.2.0.4.0 - Production
Start Date                06-JUL-2011 13:08:30
Uptime                    135 days 2 hr. 19 min. 34 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/oracle/product/db/10.2.0.4/network/admin/listener.ora
Listener Log File         /opt/oracle/product/db/10.2.0.4/network/log/listener_racstr23.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=146.18.118.206)(PORT=1526)))
Services Summary...
Service "RACSTR2" has 1 instance(s).
  Instance "RACSTR23", status READY, has 26 handler(s) for this service...
Service "RACSTR23" has 1 instance(s).
  Instance "RACSTR23", status UNKNOWN, has 1 handler(s) for this service...
Service "RACSTR2_XPT" has 1 instance(s).
  Instance "RACSTR23", status READY, has 26 handler(s) for this service...
The command completed successfully
racstress6:/opt/oracle>

And we also check the connection strings using the hostname and hostname-vip. Ex:
$ sqlplus dbsnmp@'(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=racstress4-vip)(PORT=1526))(ADDRESS=(PROTOCOL=TCP)(HOST=racstress5-vip)(PORT=1526))(ADDRESS=(PROTOCOL=TCP)(HOST=racstress6-vip)(PORT=1526))(LOAD_BALANCE=yes))(CONNECT_DATA=(SERVICE_NAME=RACSTR2)))'

And again with hostname without -vip.
It turns out the hostname with -vip works correctly and the one without host does not work.

Next thing we try is the modify the host name information and clean out the old logfile generated by EM start, and then restart the EM.

Stop the dbconsole on all nodes.

cd $O_H/<host_sid>/sysman/config
cp emoms.properties emoms.properties.ORIG
vi emoms.properties
Locate the connect descrptior at the bottom and change the physical hostname to vip in the connect descriptor.
cd $ORACLE_HOME/<Host_SID>/sysman/emd
rm lastupld.xml
rm agntstmp.txt
rm protocol.ini
rm –fr state/* Dont Remove the Directory only data inside the state directory
rm –fr upload/* Dont Remove the Directory only data inside the state directory
cd $ORACLE_HOME/<Host_SID>/sysman/log
rm *
cd $ORACLE_HOME/bin
./emctl clearstate agent
./emctl start dbconsole
./emctl upload agent
./emctl status agent
./emctl status dbconsole.
 
But it still gets the same error the dbconsole webpage still does not shows up. So, we suspect it can be agent's problem on one of the node.
And the next action item is:
Step1:
Modify the hostname in the descriptor string in emoms.properties file.
cd to the $ORACLE_HOME/hostname_sid/sysman/config
vi emoms.properties
 
In the emoms.properties, find the following line and modify the connection string,
and change the hostname from hostname.domain to hostname-vip.domain:
Ex:
oracle.sysman.eml.mntr.emdRepConnectDescriptor=(DESCRIPTION\=(ADDRESS_LIST\=(ADDRESS\=(PROTOCOL\=TCP)(HOST\=hostname-vip.dm.cdm.com)(PORT\=port#)))(CONNECT_DATA\=(SERVICE_NAME\=SID)))
 
Step2:
Modify the emd.properties file as follows:
Change the parameter
AgentListenOnAllNICs=TRUE
to
AgentListenOnAllNICs=FALSE

Step 3:
cd $ORACLE_HOME/<Host_SID>/sysman/emd
rm lastupld.xml
rm agntstmp.txt
rm protocol.ini
rm –fr state/* Dont Remove the Directory only data inside the state directory
rm –fr upload/* Dont Remove the Directory only data inside the state directory
cd $ORACLE_HOME/<Host_SID>/sysman/log
rm *
Step 3:
Kill the agent process.
ps -ef |grep emagent  --> and find out the PID of the emagent process
$ kill pid
And try to start the dbconsole again, and the db console started successfully.
















Summary:
This error can be categorized into two points.
1. The hostname specified is not correct, due to the cluster environment using vip name instead of physicall host name.
2. The agent process has not been cleaned up.

The solution to fix the first problem is change the host name specified in the emoms.properties file. Change the hostname to be the virtual ip hostname. This phenomenon is pretty common happening in the cluster environment. And change the parameter 'AgentListenOnAllNICs' from TRUE to FALSE.
Clean up all the .log files to make sure the clean start.

The solution to fix the second problem is to find out the process of the agent. Due to the dbconsole was not started successfully, the emagent process has been started and corrupted. However, it's not cleaned up by itself cleanly. So, you need to manually find out the PID of that process and kill the process.

After implement the above two steps, then restart the dbconsole and emagent.
In my case, my dbconsole started successfully afterward.

No comments:

Post a Comment