节点2 crs日志 2014-03-0711:13:22.381 [crsd(15743)]CRS-1205:Auto-startfailed for the CRS resource . Details in oradb2. 2014-03-17 18:49:32.754 [cssd(16353)]CRS-1606:CSSD Insufficientvoting files available [0 of 1]. Details in/u01/oracle/app/product/10.2/cluster/log/oradb2/cssd/ocssd.log. 2014-03-17 18:55:42.728 [cssd(16363)]CRS-1605:CSSDvoting file is online: /dev/raw/raw4. Details in/u01/oracle/app/product/10.2/cluster/log/oradb2/cssd/ocssd.log. [cssd(16363)]CRS-1601:CSSDReconfiguration complete. Active nodes are oradb1 oradb2 . 2014-03-1718:55:43.947 [crsd(15753)]CRS-1012:TheOCR service started on node oradb2. 2014-03-1718:55:43.955 [evmd(15593)]CRS-1401:EVMDstarted on node oradb2. 2014-03-1718:55:45.137 [crsd(15753)]CRS-1201:CRSDstarted on node oradb2. 2014-03-1718:55:50.559 [crsd(15753)]CRS-1205:Auto-startfailed for the CRS resource . Details in oradb2. 节点1 crs日志 2014-03-0711:12:58.537 [crsd(9600)]CRS-1205:Auto-startfailed for the CRS resource . Details in oradb1. 2014-03-1718:50:02.808 [cssd(10215)]CRS-1612:nodeoradb2 (2) at 50% heartbeat fatal, eviction in 29.210 seconds 2014-03-1718:50:17.808 [cssd(10215)]CRS-1611:nodeoradb2 (2) at 75% heartbeat fatal, eviction in 14.210 seconds 2014-03-1718:50:26.815 [cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 5.200 seconds 2014-03-1718:50:27.807 [cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 4.210 seconds 2014-03-1718:50:28.809 [cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 3.210 seconds 2014-03-1718:50:29.811 [cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 2.210 seconds 2014-03-1718:50:30.813 [cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 1.210 seconds 2014-03-1718:50:31.815 [cssd(10215)]CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 0.210 seconds 2014-03-1718:50:32.029 [cssd(10215)]CRS-1607:CSSDevicting node oradb2. Details in/u01/oracle/app/product/10.2/cluster/log/oradb1/cssd/ocssd.log. [cssd(10215)]CRS-1601:CSSDReconfiguration complete. Active nodes are oradb1 . 2014-03-17 18:50:36.178 [crsd(9600)]CRS-1204:RecoveringCRS resources for node oradb2. [cssd(10215)]CRS-1601:CSSDReconfiguration complete. Active nodes are oradb1 oradb2 . Ocssd.log日志 [ CSSD]2014-03-07 11:10:51.935 [1171433792] >TRACE: clssgmCommonAddMember:clsomon joined (1/0x1000000/#CSS_CLSSOMON) [ CSSD]2014-03-17 18:50:02.808 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 50 2.317786e-310artbeat fatal, eviction in 29.210 seconds [ CSSD]2014-03-17 18:50:02.808 [1213393216] >TRACE: clssnmPollingThread: node oradb2 (2) is impending reconfig, flag 1037, misstime30790 [ CSSD]2014-03-17 18:50:02.808 [1213393216] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2014-03-17 18:50:17.808 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 75 2.317786e-310artbeat fatal, eviction in 14.210 seconds [ CSSD]2014-03-17 18:50:26.815 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 2.317786e-310artbeat fatal, eviction in 5.200 seconds [ CSSD]2014-03-17 18:50:27.807 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 2.317786e-310artbeat fatal, eviction in 4.210 seconds [ CSSD]2014-03-17 18:50:28.809 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 3.210 seconds [ CSSD]2014-03-17 18:50:29.811 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 2.210 seconds [ CSSD]2014-03-17 18:50:30.813 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 1.210 seconds [ CSSD]2014-03-17 18:50:31.815 [1213393216] >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 0.210 seconds [ CSSD]2014-03-17 18:50:32.028 [1213393216] >TRACE: clssnmPollingThread: Eviction started for node oradb2 (2), flags 0x040d, state3, wt4c 0 [ CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE: clssnmDoSyncUpdate:Initiating sync 2 [ CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms [ CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE: clssnmSetupAckWait: Ack message type (11) [ CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE: clssnmSetupAckWait: node(1) is ALIVE [ CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE: clssnmSendSync: syncSeqNo(2) [ CSSD]2014-03-17 18:50:32.028 [1234372928] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1) [ CSSD]2014-03-17 18:50:32.028 [1160943936] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms [ CSSD]2014-03-17 18:50:32.028 [1160943936] >TRACE: clssnmHandleSync: Acknowledging sync: src[1] srcName[oradb1] seq[5] sync[2] [ CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE: clssnmWaitForAcks: done, msg type(11) [ CSSD]2014-03-17 18:50:32.029 [4126217456] >USER: NMEVENT_SUSPEND [00][00][00][06] [ CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE: clssnmDoSyncUpdate: Terminating node 2, oradb2, misstime(60010) state(5) [ CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE: clssnmSetupAckWait: Ack message type (13) [ CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE [ CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1) [ CSSD]2014-03-17 18:50:32.029 [1160943936] >TRACE: clssnmSendVoteInfo: node(1) syncSeqNo(2) [ CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE: clssnmWaitForAcks:done, msg type(13) [ CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE: clssnmCheckDskInfo: Checking disk info... [ CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE: clssnmEvict:Start [ CSSD]2014-03-17 18:50:32.029 [1234372928] >TRACE: clssnmEvict:Evicting node 2, oradb2, birth 1, death 2, impendingrcfg 1, stateflags 0x40d
监控显示该节点在故障时间点load 非常高 得出初步结论:因为节点2 负载高,导致rac 表决磁盘io超时,从而引起节点2主机重启
因为机器负载高,是因为节点1上对一个核心表大量select,而开发有在节点2上对该表执行大量插入操作导致
对于rac环境,建议对操作频繁表,建议在一个节点上进行维护
|