|
本帖最后由 Fung920 于 2014-8-26 17:04 编辑
环境:11.2.0.3 + Linux 64bit 双节点 RAC
1.状况描述
在8月22号开始,节点2首次出现
Fri Aug 22 19:26:27 2014
PMON (ospid: 13912): terminating the instance due to error 481
Instance terminated by PMON, pid = 13912
节点2开始重启,重启过程中出现
NOTE: client +ASM2:+ASM registered, osid 20296, mbr 0x0
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
WARNING: failed to online diskgroup resource ora.RECOVERY.dg (unable to communicate with CRSD/OHASD)
再次报错
Fri Aug 22 19:41:42 2014
PMON (ospid: 20218): terminating the instance due to error 481
Instance terminated by PMON, pid = 20218
之后便起不来,昨天下午17点后手动启动节点2 Cluster,但今天客户巡检的时候发现节点2 Cluster一直在重启,查看lmon的trace文件,初步怀疑是DRM问题,导致脑裂的原因是时间的不同步,但是集群是ctss同步时间,offset也是0,同时检测两个节点的时间,发现时间也是一致的,该如何解决这个问题?
补充:
22号相关日志:
--alter_db2.log
Fri Aug 22 19:26:25 2014 --第一次错误发生时间为22号19:26
NOTE: ASMB terminating
Errors in file /opt/u01/app/oracle/diag/rdbms/dbrac/dbrac2/trace/dbrac2_asmb_26030.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 82 ???: 35
Errors in file /opt/u01/app/oracle/diag/rdbms/dbrac/dbrac2/trace/dbrac2_asmb_26030.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 82 ???: 35
ASMB (ospid: 26030): terminating the instance due to error 15064
Instance terminated by ASMB, pid = 26030
Fri Aug 22 19:28:02 2014
Starting ORACLE instance (normal) --开始自动重启Cluster
--alternode2.log
2014-08-22 19:26:16.065 --时间为22号19:26
[cssd(13669)]CRS-1612:50% 的超时时间间隔内缺少与节点 dbserver_node1 (1) 的网络通信。将在 14.140 秒后从集群中删除此节点
2014-08-22 19:26:23.079
[cssd(13669)]CRS-1611:75% 的超时时间间隔内缺少与节点 dbserver_node1 (1) 的网络通信。将在 7.130 秒后从集群中删除此节点
--ocssd.log:
2014-08-22 19:26:16.065: [ CSSD][1113200960]clssnmPollingThread: node dbserver_node1 (1) at 50% heartbeat fatal, removal in 14.140 seconds --貌似是Heartbeat不通
2014-08-22 19:26:16.065: [ CSSD][1113200960]clssnmPollingThread: node dbserver_node1 (1) is impending reconfig, flag 2491406, misstime 15860
2014-08-22 19:26:16.065: [ CSSD][1113200960]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
2014-08-22 19:26:16.066: [ CSSD][1106893120]clssnmvDHBValidateNCopy: node 1, dbserver_node1, has a disk HB, but no network HB, DHB has rcfg 295030317, wrtcnt, 27424336, LATS 576293252, lastSeqNo 25251022, uniqueness 1399539197, timestamp 1408706775/576374502
2014-08-22 19:26:17.005: [ CSSD][1114777920]clssnmSendingThread: sending status msg to all nodes
2014-08-22 19:26:17.005: [ CSSD][1114777920]clssnmSendingThread: sent 4 status msgs to all nodes
2014-08-22 19:26:17.068: [ CSSD][1106893120]clssnmvDHBValidateNCopy: node 1, dbserver_node1, has a disk HB, but no network HB, DHB has rcfg 295030317, wrtcnt, 27424342, LATS 576294252, lastSeqNo 27424336, uniqueness 1399539197, timestamp 1408706776/576375512 --心跳出问题,内部通信有问题
客户使用的是VMWARE下部署的RAC,之前就跟他们说别在虚拟机下搞,唉。。。
26号部分日志:
--db alter_node2.log
Tue Aug 26 15:15:21 2014 --时间在15:15分左右
NOTE: ASMB terminating
Errors in file /opt/u01/app/oracle/diag/rdbms/dbrac/dbrac2/trace/dbrac2_asmb_27935.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 4 ???: 5
Errors in file /opt/u01/app/oracle/diag/rdbms/dbrac/dbrac2/trace/dbrac2_asmb_27935.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 4 ???: 5
ASMB (ospid: 27935): terminating the instance due to error 15064
Instance terminated by ASMB, pid = 27935
Tue Aug 26 15:17:00 2014
Starting ORACLE instance (normal)
|
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有帐号?立即注册
x
|