ORACLE SOS

 找回密码
 立即注册

QQ登录

只需一步,快速开始

搜索
查看: 5057|回复: 0

RAC 由于负载高,节点被踢

[复制链接]

95

主题

266

帖子

1719

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1719
发表于 2014-3-20 14:58:46 | 显示全部楼层 |阅读模式
节点2 crs日志
2014-03-0711:13:22.381
[crsd(15743)]CRS-1205:Auto-startfailed for the CRS resource . Details in oradb2.
2014-03-17 18:49:32.754
[cssd(16353)]CRS-1606:CSSD Insufficientvoting files available [0 of 1]. Details in/u01/oracle/app/product/10.2/cluster/log/oradb2/cssd/ocssd.log.
2014-03-17 18:55:42.728
[cssd(16363)]CRS-1605:CSSDvoting file is online: /dev/raw/raw4. Details in/u01/oracle/app/product/10.2/cluster/log/oradb2/cssd/ocssd.log.
[cssd(16363)]CRS-1601:CSSDReconfiguration complete. Active nodes are oradb1 oradb2 .
2014-03-1718:55:43.947
[crsd(15753)]CRS-1012:TheOCR service started on node oradb2.
2014-03-1718:55:43.955
[evmd(15593)]CRS-1401:EVMDstarted on node oradb2.
2014-03-1718:55:45.137
[crsd(15753)]CRS-1201:CRSDstarted on node oradb2.
2014-03-1718:55:50.559
[crsd(15753)]CRS-1205:Auto-startfailed for the CRS resource . Details in oradb2.
节点1 crs日志
2014-03-0711:12:58.537
[crsd(9600)]CRS-1205:Auto-startfailed for the CRS resource . Details in oradb1.
2014-03-1718:50:02.808
[cssd(10215)]CRS-1612:nodeoradb2 (2) at 50% heartbeat fatal, eviction in 29.210 seconds
2014-03-1718:50:17.808
[cssd(10215)]CRS-1611:nodeoradb2 (2) at 75% heartbeat fatal, eviction in 14.210 seconds
2014-03-1718:50:26.815
[cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 5.200 seconds
2014-03-1718:50:27.807
[cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 4.210 seconds
2014-03-1718:50:28.809
[cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 3.210 seconds
2014-03-1718:50:29.811
[cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 2.210 seconds
2014-03-1718:50:30.813
[cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 1.210 seconds
2014-03-1718:50:31.815
[cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 0.210 seconds
2014-03-1718:50:32.029
[cssd(10215)]CRS-1607:CSSDevicting node oradb2. Details in/u01/oracle/app/product/10.2/cluster/log/oradb1/cssd/ocssd.log.
[cssd(10215)]CRS-1601:CSSDReconfiguration complete. Active nodes are oradb1 .
2014-03-17 18:50:36.178
[crsd(9600)]CRS-1204:RecoveringCRS resources for node oradb2.
[cssd(10215)]CRS-1601:CSSDReconfiguration complete. Active nodes are oradb1 oradb2 .
Ocssd.log日志
[   CSSD]2014-03-07 11:10:51.935 [1171433792] >TRACE:   clssgmCommonAddMember:clsomon joined (1/0x1000000/#CSS_CLSSOMON)
[   CSSD]2014-03-17 18:50:02.808 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 50 2.317786e-310artbeat fatal, eviction in 29.210 seconds
[   CSSD]2014-03-17 18:50:02.808 [1213393216] >TRACE:  clssnmPollingThread: node oradb2 (2) is impending reconfig, flag 1037, misstime30790
[   CSSD]2014-03-17 18:50:02.808 [1213393216] >TRACE:  clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[   CSSD]2014-03-17 18:50:17.808 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 75 2.317786e-310artbeat fatal, eviction in 14.210 seconds
[   CSSD]2014-03-17 18:50:26.815 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 2.317786e-310artbeat fatal, eviction in 5.200 seconds
[   CSSD]2014-03-17 18:50:27.807 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 2.317786e-310artbeat fatal, eviction in 4.210 seconds
[   CSSD]2014-03-17 18:50:28.809 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 3.210 seconds
[   CSSD]2014-03-17 18:50:29.811 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 2.210 seconds
[   CSSD]2014-03-17 18:50:30.813 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 1.210 seconds
[   CSSD]2014-03-17 18:50:31.815 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 0.210 seconds
[   CSSD]2014-03-17 18:50:32.028 [1213393216] >TRACE:  clssnmPollingThread: Eviction started for node oradb2 (2), flags 0x040d, state3, wt4c 0
[   CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE:   clssnmDoSyncUpdate:Initiating sync 2
[   CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE:  clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[   CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE:  clssnmSetupAckWait: Ack message type (11)
[   CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE:  clssnmSetupAckWait: node(1) is ALIVE
[   CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE:  clssnmSendSync: syncSeqNo(2)
[   CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE:  clssnmWaitForAcks: Ack message type(11), ackCount(1)
[   CSSD]2014-03-17 18:50:32.028 [1160943936] >TRACE:  clssnmHandleSync: diskTimeout set to (57000)ms
[   CSSD]2014-03-17 18:50:32.028 [1160943936] >TRACE:  clssnmHandleSync: Acknowledging sync: src[1] srcName[oradb1] seq[5] sync[2]
[   CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE:  clssnmWaitForAcks: done, msg type(11)
[   CSSD]2014-03-17 18:50:32.029 [4126217456] >USER:   NMEVENT_SUSPEND [00][00][00][06]
[   CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE:  clssnmDoSyncUpdate: Terminating node 2, oradb2, misstime(60010) state(5)
[   CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE:  clssnmSetupAckWait: Ack message type (13)
[   CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE:  clssnmSetupAckWait: node(1) is ACTIVE
[   CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE:  clssnmWaitForAcks: Ack message type(13), ackCount(1)
[   CSSD]2014-03-17 18:50:32.029 [1160943936] >TRACE:  clssnmSendVoteInfo: node(1) syncSeqNo(2)
[   CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE:   clssnmWaitForAcks:done, msg type(13)
[   CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE:  clssnmCheckDskInfo: Checking disk info...
[   CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE:   clssnmEvict:Start
[   CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE:   clssnmEvict:Evicting node 2, oradb2, birth 1, death 2, impendingrcfg 1, stateflags 0x40d

监控显示该节点在故障时间点load 非常高
得出初步结论:因为节点2 负载高,导致rac 表决磁盘io超时,从而引起节点2主机重启

因为机器负载高,是因为节点1上对一个核心表大量select,而开发有在节点2上对该表执行大量插入操作导致

对于rac环境,建议对操作频繁表,建议在一个节点上进行维护

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?立即注册

x

Q Q:107644445
Tel:13429648788
Email:dba@xifenfei.com
个人Blog(惜分飞)
提供专业ORACLE技术支持(数据恢复,安装实施,升级迁移,备份容灾,故障诊断,系统优化等)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|Archiver|手机版|ORACLE SOS 技术论坛

GMT+8, 2024-5-19 07:22 , Processed in 0.020117 second(s), 21 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表