我的RAC节点2挂掉了,无法启动
节点1正常$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGETSTATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINEONLINE rac1
ora.FRA.dg
ONLINEONLINE rac1
ora.LISTENER.lsnr
ONLINEONLINE rac1
ora.asm
ONLINEONLINE rac1 Started
ora.gsd
OFFLINE OFFLINE rac1
ora.net1.network
ONLINEONLINE rac1
ora.ons
ONLINEONLINE rac1
ora.registry.acfs
ONLINEONLINE rac1
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINEONLINE rac1
ora.cvu
1 ONLINEONLINE rac1
ora.itsm.db
1 ONLINEONLINE rac1 Open
2 ONLINEOFFLINE
ora.oc4j
1 ONLINEONLINE rac1
ora.rac1.vip
1 ONLINEONLINE rac1
ora.rac2.vip
1 ONLINEINTERMEDIATE rac1 FAILED OVER
ora.scan1.vip
1 ONLINEONLINE rac1
节点二就出问题了:
# crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
我尝试启动:
# crsctl start cluster -all
CRS-4404: The following nodes did not reply within the allotted time:
rac1, rac2
CRS-2672: Attempting to start 'ora.cssd' on 'rac2'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac2'
CRS-2676: Start of 'ora.diskmon' on 'rac2' succeeded
CRS-4705: Start of Clusterware failed on node rac2.
CRS-4000: Command Start failed, or completed with errors.
我查看了下OCSSD日志,节点PING都通的
015-02-22 18:41:03.029: [ CSSD]clssgmClientConnectMsg: Connect from con(0x28b9) proc(0x2b651d0) pid(3845) version 11:2:1:4, properties: 1,2,3,4,5
2015-02-22 18:41:03.029: [ CSSD]clssgmClientConnectMsg: msg flags 0x0000
2015-02-22 18:41:03.031: [ CSSD]clssscSelect: cookie accept request 0x2b651d0
2015-02-22 18:41:03.031: [ CSSD]clssscevtypSHRCON: getting client with cmproc 0x2b651d0
2015-02-22 18:41:03.031: [ CSSD]clssgmRegisterClient: proc(4/0x2b651d0), client(1/0x2b50a80)
2015-02-22 18:41:03.031: [ CSSD]clssgmJoinGrock: global grock CRF- new client 0x2b50a80 with con 0x7f1e000028e8, requested num -1, flags 0x4000e00
2015-02-22 18:41:03.031: [ CSSD]clssgmJoinGrock: ignoring grock join for client not requiring fencing until group information has been received from the master; group name CRF-, member number -1, flags 0x4000e00
2015-02-22 18:41:03.032: [ CSSD]clssgmDiscEndpcl: gipcDestroy 0x28e8
2015-02-22 18:41:03.509: [ CSSD]clssgmWaitOnEventValue: after CmInfo Stateval 3, eval 1 waited 0
2015-02-22 18:41:03.709: [ CSSD]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 320163447, wrtcnt, 109095, LATS 41143754, lastSeqNo 109094, uniqueness 1424715417, timestamp 1426129157/97505444
2015-02-22 18:41:04.510: [ CSSD]clssgmWaitOnEventValue: after CmInfo Stateval 3, eval 1 waited 0
2015-02-22 18:41:04.712: [ CSSD]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 320163447, wrtcnt, 109096, LATS 41144754, lastSeqNo 109095, uniqueness 1424715417, timestamp 1426129158/97506444
然后我看下数据库是未启动的,我尝试启动数据库:
$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sun Feb 22 18:42:59 2015
Copyright (c) 1982, 2013, Oracle.All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/itsm/spfileitsm.ora'
ORA-17503: ksfdopn:2 Failed to open file +DATA/itsm/spfileitsm.ora
ORA-15077: could not locate ASM instance serving a required diskgroup
SQL>
我有尝试启动节点2ASM实例,节点一的数据库和ASM实例都是正常的:
$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Sun Feb 22 18:44:04 2015
Copyright (c) 1982, 2013, Oracle.All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-01078: failure in processing system parameters
ORA-29701: unable to connect to Cluster Synchronization Service
不知道什么原因了,请赐教
附件是日志,节点1,2互PING SSH都没问题 没有附件? 2015-02-22 07:31:49.082: CS(0x7f1c5c064560)set Properties ( root,0x361e410)
2015-02-22 07:31:49.094: {2:7263:43} Sending message to PE. ctx= 0x7f1c5c07c890, Client PID: 7246
2015-02-22 07:31:49.094: {2:7263:43} Master is not known. Rejecting the command: 13
2015-02-22 07:31:49.470: gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound (36), host 'rac2', port 'bb5b-e793-981d-7626', hctx 0x2d85bf0 { gipchaContext : host 'rac2', name '84bf-806f-7ec4-8d29', luid '42b44c60-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, ret gipcretKeyNotFound (36)
2015-02-22 07:31:49.471: gipchaResolveF : EXCEPTION[ ret gipcretKeyNotFound (36) ]failed to resolve ctx 0x2d85bf0 { gipchaContext : host 'rac2', name '84bf-806f-7ec4-8d29', luid '42b44c60-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, host 'rac2', port 'bb5b-e793-981d-7626', flags 0x0
2015-02-22 07:31:49.473: gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound (36), host 'rac2', port '0da4-309d-2db4-bf1b', hctx 0x2d85bf0 { gipchaContext : host 'rac2', name '84bf-806f-7ec4-8d29', luid '42b44c60-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, ret gipcretKeyNotFound (36)
2015-02-22 07:31:49.473: gipchaResolveF : EXCEPTION[ ret gipcretKeyNotFound (36) ]failed to resolve ctx 0x2d85bf0 { gipchaContext : host 'rac2', name '84bf-806f-7ec4-8d29', luid '42b44c60-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, host 'rac2', port '0da4-309d-2db4-bf1b', flags 0x0
2015-02-22 07:31:49.668: CS(0x7f1c5c03e110)set Properties ( grid,0x34f5050)
2015-02-22 07:31:49.680: {2:7263:44} Sending message to PE. ctx= 0x7f1c5c03fdd0, Client PID: 3652
2015-02-22 07:31:49.680: {2:7263:44} Master is not known. Rejecting the command: 14
2015-02-22 07:31:49.795: [ CRSPE]{2:7263:2} Join request has been processed by the Master.
尝试ping rac2 试试看,另外贴出来hosts文件 如下是RAC2的,RAC1也都能PING通,SSH也都没问题
$ ping rac2
PING rac2.localdomain (192.168.0.106) 56(84) bytes of data.
64 bytes from rac2.localdomain (192.168.0.106): icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from rac2.localdomain (192.168.0.106): icmp_seq=2 ttl=64 time=0.037 ms
^C
--- rac2.localdomain ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.037/0.037/0.037/0.000 ms
$ ping rac2-priv
PING rac2-priv.localdomain (192.168.1.106) 56(84) bytes of data.
64 bytes from rac2-priv.localdomain (192.168.1.106): icmp_seq=1 ttl=64 time=0.039 ms
^C
--- rac2-priv.localdomain ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms
$ ping rac2-vip
PING rac2-vip.localdomain (192.168.0.110) 56(84) bytes of data.
64 bytes from rac2-vip.localdomain (192.168.0.110): icmp_seq=1 ttl=64 time=5.34 ms
^C
--- rac2-vip.localdomain ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.342/5.342/5.342/0.000 ms
$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.0.105 rac1.localdomain rac1
192.168.0.106 rac2.localdomain rac2
# Private
192.168.1.105 rac1-priv.localdomain rac1-priv
192.168.1.106 rac2-priv.localdomain rac2-priv
# Virtual
192.168.0.109 rac1-vip.localdomain rac1-vip
192.168.0.110 rac2-vip.localdomain rac2-vip
# SCAN
192.168.0.11 scan.localdomain scan
$
页:
[1]