ORACLE SOS

 找回密码
 立即注册

QQ登录

只需一步,快速开始

搜索
查看: 8607|回复: 5

rac节点二执行root.sh后'ora.diskmon‘Command Start failed,

[复制链接]

3

主题

7

帖子

65

积分

注册会员

Rank: 2

积分
65
发表于 2014-10-28 18:59:25 | 显示全部楼层 |阅读模式
本帖最后由 weishuai1020 于 2014-10-30 09:31 编辑

[root@jcsjdb02 ~]# /u01/app/11.2.0/grid/root.sh
Running Oracle 11g root.sh script...
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
   Copying dbhome to /usr/local/bin ...
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
   Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
   Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2014-10-28 17:32:33: Parsing the host name
2014-10-28 17:32:33: Checking for super user privileges
2014-10-28 17:32:33: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
acfsroot: ACFS-9301: ADVM/ACFS installation can not proceed:
acfsroot: ACFS-9302: No installation files found at /u01/app/11.2.0/grid/install/usm/EL5/x86_64/2.6.18-8/2.6.18-8.x86_64-x86_64/bin.
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node jcsjdb01, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'jcsjdb02'
CRS-2676: Start of 'ora.mdnsd' on 'jcsjdb02' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'jcsjdb02'
CRS-2676: Start of 'ora.gipcd' on 'jcsjdb02' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'jcsjdb02'
CRS-2676: Start of 'ora.gpnpd' on 'jcsjdb02' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'jcsjdb02'
CRS-2676: Start of 'ora.cssdmonitor' on 'jcsjdb02' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'jcsjdb02'
CRS-2672: Attempting to start 'ora.diskmon' on 'jcsjdb02'
CRS-2676: Start of 'ora.diskmon' on 'jcsjdb02' succeeded
CRS-2674: Start of 'ora.cssd' on 'jcsjdb02' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'jcsjdb02'
CRS-2681: Clean of 'ora.cssd' on 'jcsjdb02' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'jcsjdb02'
CRS-2677: Stop of 'ora.diskmon' on 'jcsjdb02' succeeded
CRS-4000: Command Start failed, or completed with errors.
CRS-2672: Attempting to start 'ora.cssd' on 'jcsjdb02'
CRS-2672: Attempting to start 'ora.diskmon' on 'jcsjdb02'
CRS-2674: Start of 'ora.diskmon' on 'jcsjdb02' failed
CRS-2679: Attempting to clean 'ora.diskmon' on 'jcsjdb02'
CRS-5016: Process "/u01/app/11.2.0/grid/bin/diskmon" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "clean" failed: details at "(:CLSN00010" in "/u01/app/11.2.0/grid/log/jcsjdb02/agent/ohasd/orarootagent_root/orarootagent_root.log"
CRS-2681: Clean of 'ora.diskmon' on 'jcsjdb02' succeeded
CRS-2674: Start of 'ora.cssd' on 'jcsjdb02' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'jcsjdb02'
CRS-2681: Clean of 'ora.cssd' on 'jcsjdb02' succeeded
CRS-4000: Command Start failed, or completed with errors.
Command return code of 1 (256) from command: /u01/app/11.2.0/grid/bin/crsctl start resource ora.ctssd -init -env USR_ORA_ENV=CTSS_REBOOT=TRUE
Start of resource "ora.ctssd -init -env USR_ORA_ENV=CTSS_REBOOT=TRUE" failed

Failed to start CTSS
Failed to start Oracle Clusterware stack

环境说明:redhat 6.5+11g RAC    iptables selinux 都已经关闭。我在网上查找这类问题说是防火墙没有关闭造成的。我关闭防火墙和SELINUX,且机器重启后。问题依然存在。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
回复

使用道具 举报

95

主题

266

帖子

1719

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1719
发表于 2014-10-30 09:45:19 | 显示全部楼层
2014-10-29 16:47:59.324: [    CSSD][4174305024]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-10-29 16:47:59.699: [    CSSD][4226795264]clssnmvDHBValidateNCopy: node 1, jcsjdb01, has a disk HB, but no network HB, DHB has rcfg 309979712, wrtcnt, 83843, LATS 85264714, lastSeqNo 83843, uniqueness 1414488577, timestamp 1414572479/85245244
2014-10-29 16:47:59.700: [    CSSD][3793733376]clssnmconnect: connecting to addr gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365
2014-10-29 16:47:59.700: [ GIPCNET][3793733376]gipcmodNetworkProcessConnect: [network]  failed connect attempt endp 0x7f19c8008560 [0000000000001dc3] { gipcEndpoint : localAddr 'gipc://jcsjdb02:e527-402e-7f6b-b01a#127.0.0.1#27111', remoteAddr 'gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365', numPend 0, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x80612, usrFlags 0x0 }, req 0x7f19c8009990 [0000000000001dcc] { gipcConnectRequest : addr 'gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365', parentEndp 0x7f19
2014-10-29 16:47:59.700: [ GIPCNET][3793733376]gipcmodNetworkProcessConnect: slos op  :  sgipcnTcpConnect
2014-10-29 16:47:59.700: [ GIPCNET][3793733376]gipcmodNetworkProcessConnect: slos dep :  Invalid argument (22)
2014-10-29 16:47:59.700: [ GIPCNET][3793733376]gipcmodNetworkProcessConnect: slos loc :  connect
2014-10-29 16:47:59.700: [ GIPCNET][3793733376]gipcmodNetworkProcessConnect: slos info:  addr '192.168.88.180:34365'
2014-10-29 16:47:59.700: [    CSSD][3793733376]clssscConnect: endp 0x1dc3 - cookie 0x189a190 - addr gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365
2014-10-29 16:47:59.700: [    CSSD][3793733376]clssnmconnect: connecting to node(1), endp(0x1dc3), flags 0x10002
2014-10-29 16:47:59.700: [    CSSD][3793733376]clssscSelect: conn complete ctx 0x189a190 endp 0x1dc3
2014-10-29 16:47:59.700: [    CSSD][3793733376]clssnmeventhndlr: node(1), endp(0x1dc3) failed, probe((nil)) ninf->endp (0x100001dc3) CONNCOMPLETE
2014-10-29 16:47:59.700: [    CSSD][3793733376]clssnmDiscHelper: jcsjdb01, node(1) connection failed, endp (0x1dc3), probe(0x100000000), ninf->endp 0x7f1900001dc3
2014-10-29 16:47:59.700: [    CSSD][3793733376]clssnmDiscHelper: node 1 clean up, endp (0x1dc3), init state 0, cur state 0
2014-10-29 16:47:59.701: [GIPCXCPT][3793733376]gipcInternalDissociate: obj 0x7f19c8008560 [0000000000001dc3] { gipcEndpoint : localAddr 'gipc://jcsjdb02:e527-402e-7f6b-b01a#127.0.0.1#27111', remoteAddr 'gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
2014-10-29 16:47:59.701: [GIPCXCPT][3793733376]gipcDissociateF [clssnmDiscHelper : clssnm.c : 3215]: EXCEPTION[ ret gipcretFail (1) ]  failed to dissociate obj 0x7f19c8008560 [0000000000001dc3] { gipcEndpoint : localAddr 'gipc://jcsjdb02:e527-402e-7f6b-b01a#127.0.0.1#27111', remoteAddr 'gipc://jcsjdb01:nm_jcsjdb-cluster#192.168.88.180#34365', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 }, flags 0x0
2014-10-29 16:47:59.701: [    CSSD][3793733376]clssnmDiscEndp: gipcDestroy 0x1dc3
2014-10-29 16:48:00.325: [    CSSD][4174305024]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-10-29 16:48:00.700: [    CSSD][4226795264]clssnmvDHBValidateNCopy: node 1, jcsjdb01, has a disk HB, but no network HB, DHB has rcfg 309979712, wrtcnt, 83844, LATS 85265714, lastSeqNo 83844, uniqueness 1414488577, timestamp 1414572480/85246244
从这里看,很可能是私有网络有问题
建议处理:
  • short-term: disable the firewall on all nodes. For other platforms, engage SA, on Linux this can be done by running the following command(s) as the root user on each node of the cluster:service iptables stop
    service ip6tables stop
    To permanently disable the firewall, use:chkconfig iptables off
    chkconfig ip6tables off
  • long-term: exclude all traffic on the private network from the firewall configuration.

Q Q:107644445
Tel:13429648788
Email:dba@xifenfei.com
个人Blog(惜分飞)
提供专业ORACLE技术支持(数据恢复,安装实施,升级迁移,备份容灾,故障诊断,系统优化等)
回复 支持 反对

使用道具 举报

95

主题

266

帖子

1719

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1719
发表于 2014-10-30 09:46:12 | 显示全部楼层
可以参考:        11gR2 Grid: root.sh Fails to Start the Clusterware on the Second Node Due to Firewall on Private Network (Doc ID 981357.1)

Q Q:107644445
Tel:13429648788
Email:dba@xifenfei.com
个人Blog(惜分飞)
提供专业ORACLE技术支持(数据恢复,安装实施,升级迁移,备份容灾,故障诊断,系统优化等)
回复 支持 反对

使用道具 举报

3

主题

7

帖子

65

积分

注册会员

Rank: 2

积分
65
 楼主| 发表于 2014-10-30 09:51:58 | 显示全部楼层
xifenfei 发表于 2014-10-30 09:46
可以参考:        11gR2 Grid: root.sh Fails to Start the Clusterware on the Second Node Due to Firewall on  ...

非常感谢飞总,防火墙和selinux之前都关了的,现在按照你的方法我再试试
回复 支持 反对

使用道具 举报

3

主题

7

帖子

65

积分

注册会员

Rank: 2

积分
65
 楼主| 发表于 2014-10-31 16:58:38 | 显示全部楼层
在解决问题的过程中,感谢飞总耐心的支持。出现这种问题的原因,除防火墙、selinux未关闭外。还和私有IP对应的网卡有关系。两个节点需要严格对应相同名称端口。上述问题的原因就是因为节点一和节点二的私有端口不对应造成的。谨记了。
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|Archiver|手机版|ORACLE SOS 技术论坛

GMT+8, 2024-12-5 10:44 , Processed in 0.050260 second(s), 21 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表