weishuai1020 发表于 2014-10-28 18:59:25

rac节点二执行root.sh后'ora.diskmon‘Command Start failed,

本帖最后由 weishuai1020 于 2014-10-30 09:31 编辑

# /u01/app/11.2.0/grid/root.sh
Running Oracle 11g root.sh script...
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=/u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: :
The file "dbhome" already exists in /usr/local/bin.Overwrite it? (y/n)
: y
   Copying dbhome to /usr/local/bin ...
The file "oraenv" already exists in /usr/local/bin.Overwrite it? (y/n)
: y
   Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin.Overwrite it? (y/n)
: y
   Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2014-10-28 17:32:33: Parsing the host name
2014-10-28 17:32:33: Checking for super user privileges
2014-10-28 17:32:33: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
acfsroot: ACFS-9301: ADVM/ACFS installation can not proceed:
acfsroot: ACFS-9302: No installation files found at /u01/app/11.2.0/grid/install/usm/EL5/x86_64/2.6.18-8/2.6.18-8.x86_64-x86_64/bin.
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node jcsjdb01, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'jcsjdb02'
CRS-2676: Start of 'ora.mdnsd' on 'jcsjdb02' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'jcsjdb02'
CRS-2676: Start of 'ora.gipcd' on 'jcsjdb02' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'jcsjdb02'
CRS-2676: Start of 'ora.gpnpd' on 'jcsjdb02' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'jcsjdb02'
CRS-2676: Start of 'ora.cssdmonitor' on 'jcsjdb02' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'jcsjdb02'
CRS-2672: Attempting to start 'ora.diskmon' on 'jcsjdb02'
CRS-2676: Start of 'ora.diskmon' on 'jcsjdb02' succeeded
CRS-2674: Start of 'ora.cssd' on 'jcsjdb02' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'jcsjdb02'
CRS-2681: Clean of 'ora.cssd' on 'jcsjdb02' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'jcsjdb02'
CRS-2677: Stop of 'ora.diskmon' on 'jcsjdb02' succeeded
CRS-4000: Command Start failed, or completed with errors.
CRS-2672: Attempting to start 'ora.cssd' on 'jcsjdb02'
CRS-2672: Attempting to start 'ora.diskmon' on 'jcsjdb02'
CRS-2674: Start of 'ora.diskmon' on 'jcsjdb02' failed
CRS-2679: Attempting to clean 'ora.diskmon' on 'jcsjdb02'
CRS-5016: Process "/u01/app/11.2.0/grid/bin/diskmon" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "clean" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/jcsjdb02/agent/ohasd/orarootagent_root/orarootagent_root.log"
CRS-2681: Clean of 'ora.diskmon' on 'jcsjdb02' succeeded
CRS-2674: Start of 'ora.cssd' on 'jcsjdb02' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'jcsjdb02'
CRS-2681: Clean of 'ora.cssd' on 'jcsjdb02' succeeded
CRS-4000: Command Start failed, or completed with errors.
Command return code of 1 (256) from command: /u01/app/11.2.0/grid/bin/crsctl start resource ora.ctssd -init -env USR_ORA_ENV=CTSS_REBOOT=TRUE
Start of resource "ora.ctssd -init -env USR_ORA_ENV=CTSS_REBOOT=TRUE" failed
Failed to start CTSS
Failed to start Oracle Clusterware stack

环境说明:redhat 6.5+11g RAC    iptables selinux 都已经关闭。我在网上查找这类问题说是防火墙没有关闭造成的。我关闭防火墙和SELINUX,且机器重启后。问题依然存在。

xifenfei 发表于 2014-10-30 09:45:19

2014-10-29 16:47:59.324: [    CSSD]clssgmWaitOnEventValue: after CmInfo Stateval 3, eval 1 waited 0
2014-10-29 16:47:59.699: [    CSSD]clssnmvDHBValidateNCopy: node 1, jcsjdb01, has a disk HB, but no network HB, DHB has rcfg 309979712, wrtcnt, 83843, LATS 85264714, lastSeqNo 83843, uniqueness 1414488577, timestamp 1414572479/85245244
2014-10-29 16:47:59.700: [    CSSD]clssnmconnect: connecting to addr gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365
2014-10-29 16:47:59.700: [ GIPCNET]gipcmodNetworkProcessConnect: failed connect attempt endp 0x7f19c8008560 { gipcEndpoint : localAddr 'gipc://jcsjdb02:e527-402e-7f6b-b01a#127.0.0.1#27111', remoteAddr 'gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365', numPend 0, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x80612, usrFlags 0x0 }, req 0x7f19c8009990 { gipcConnectRequest : addr 'gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365', parentEndp 0x7f19
2014-10-29 16:47:59.700: [ GIPCNET]gipcmodNetworkProcessConnect: slos op:sgipcnTcpConnect
2014-10-29 16:47:59.700: [ GIPCNET]gipcmodNetworkProcessConnect: slos dep :Invalid argument (22)
2014-10-29 16:47:59.700: [ GIPCNET]gipcmodNetworkProcessConnect: slos loc :connect
2014-10-29 16:47:59.700: [ GIPCNET]gipcmodNetworkProcessConnect: slos info:addr '192.168.88.180:34365'
2014-10-29 16:47:59.700: [    CSSD]clssscConnect: endp 0x1dc3 - cookie 0x189a190 - addr gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365
2014-10-29 16:47:59.700: [    CSSD]clssnmconnect: connecting to node(1), endp(0x1dc3), flags 0x10002
2014-10-29 16:47:59.700: [    CSSD]clssscSelect: conn complete ctx 0x189a190 endp 0x1dc3
2014-10-29 16:47:59.700: [    CSSD]clssnmeventhndlr: node(1), endp(0x1dc3) failed, probe((nil)) ninf->endp (0x100001dc3) CONNCOMPLETE
2014-10-29 16:47:59.700: [    CSSD]clssnmDiscHelper: jcsjdb01, node(1) connection failed, endp (0x1dc3), probe(0x100000000), ninf->endp 0x7f1900001dc3
2014-10-29 16:47:59.700: [    CSSD]clssnmDiscHelper: node 1 clean up, endp (0x1dc3), init state 0, cur state 0
2014-10-29 16:47:59.701: gipcInternalDissociate: obj 0x7f19c8008560 { gipcEndpoint : localAddr 'gipc://jcsjdb02:e527-402e-7f6b-b01a#127.0.0.1#27111', remoteAddr 'gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
2014-10-29 16:47:59.701: gipcDissociateF : EXCEPTION[ ret gipcretFail (1) ]failed to dissociate obj 0x7f19c8008560 { gipcEndpoint : localAddr 'gipc://jcsjdb02:e527-402e-7f6b-b01a#127.0.0.1#27111', remoteAddr 'gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 }, flags 0x0
2014-10-29 16:47:59.701: [    CSSD]clssnmDiscEndp: gipcDestroy 0x1dc3
2014-10-29 16:48:00.325: [    CSSD]clssgmWaitOnEventValue: after CmInfo Stateval 3, eval 1 waited 0
2014-10-29 16:48:00.700: [    CSSD]clssnmvDHBValidateNCopy: node 1, jcsjdb01, has a disk HB, but no network HB, DHB has rcfg 309979712, wrtcnt, 83844, LATS 85265714, lastSeqNo 83844, uniqueness 1414488577, timestamp 1414572480/85246244
从这里看,很可能是私有网络有问题
建议处理:

[*]short-term: disable the firewall on all nodes. For other platforms, engage SA, on Linux this can be done by running the following command(s) as the root user on each node of the cluster:service iptables stop
service ip6tables stop
To permanently disable the firewall, use:chkconfig iptables off
chkconfig ip6tables off

[*] long-term: exclude all traffic on the private network from the firewall configuration.

xifenfei 发表于 2014-10-30 09:46:12

可以参考:        11gR2 Grid: root.sh Fails to Start the Clusterware on the Second Node Due to Firewall on Private Network (Doc ID 981357.1)

weishuai1020 发表于 2014-10-30 09:51:58

xifenfei 发表于 2014-10-30 09:46
可以参考:        11gR2 Grid: root.sh Fails to Start the Clusterware on the Second Node Due to Firewall on...

非常感谢飞总,防火墙和selinux之前都关了的,现在按照你的方法我再试试

weishuai1020 发表于 2014-10-31 16:58:38

在解决问题的过程中,感谢飞总耐心的支持。出现这种问题的原因,除防火墙、selinux未关闭外。还和私有IP对应的网卡有关系。两个节点需要严格对应相同名称端口。上述问题的原因就是因为节点一和节点二的私有端口不对应造成的。谨记了。

baoyintu 发表于 2015-2-11 17:46:21

:)..................
页: [1]
查看完整版本: rac节点二执行root.sh后'ora.diskmon‘Command Start failed,