xifenfei 发表于 2014-3-20 14:58:46

RAC 由于负载高,节点被踢

节点2 crs日志2014-03-0711:13:22.381CRS-1205:Auto-startfailed for the CRS resource . Details in oradb2.2014-03-17 18:49:32.754CRS-1606:CSSD Insufficientvoting files available . Details in/u01/oracle/app/product/10.2/cluster/log/oradb2/cssd/ocssd.log.2014-03-17 18:55:42.728CRS-1605:CSSDvoting file is online: /dev/raw/raw4. Details in/u01/oracle/app/product/10.2/cluster/log/oradb2/cssd/ocssd.log.CRS-1601:CSSDReconfiguration complete. Active nodes are oradb1 oradb2 .2014-03-1718:55:43.947CRS-1012:TheOCR service started on node oradb2.2014-03-1718:55:43.955CRS-1401:EVMDstarted on node oradb2.2014-03-1718:55:45.137CRS-1201:CRSDstarted on node oradb2.2014-03-1718:55:50.559CRS-1205:Auto-startfailed for the CRS resource . Details in oradb2. 节点1 crs日志2014-03-0711:12:58.537CRS-1205:Auto-startfailed for the CRS resource . Details in oradb1.2014-03-1718:50:02.808CRS-1612:nodeoradb2 (2) at 50% heartbeat fatal, eviction in 29.210 seconds2014-03-1718:50:17.808CRS-1611:nodeoradb2 (2) at 75% heartbeat fatal, eviction in 14.210 seconds2014-03-1718:50:26.815CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 5.200 seconds2014-03-1718:50:27.807CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 4.210 seconds2014-03-1718:50:28.809CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 3.210 seconds2014-03-1718:50:29.811CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 2.210 seconds2014-03-1718:50:30.813CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 1.210 seconds2014-03-1718:50:31.815CRS-1610:nodeoradb2 (2) at 90% heartbeat fatal, eviction in 0.210 seconds2014-03-1718:50:32.029CRS-1607:CSSDevicting node oradb2. Details in/u01/oracle/app/product/10.2/cluster/log/oradb1/cssd/ocssd.log.CRS-1601:CSSDReconfiguration complete. Active nodes are oradb1 .2014-03-17 18:50:36.178CRS-1204:RecoveringCRS resources for node oradb2.CRS-1601:CSSDReconfiguration complete. Active nodes are oradb1 oradb2 . Ocssd.log日志[   CSSD]2014-03-07 11:10:51.935 >TRACE:   clssgmCommonAddMember:clsomon joined (1/0x1000000/#CSS_CLSSOMON)[   CSSD]2014-03-17 18:50:02.808 >WARNING: clssnmPollingThread:node oradb2 (2) at 50 2.317786e-310artbeat fatal, eviction in 29.210 seconds[   CSSD]2014-03-17 18:50:02.808 >TRACE:clssnmPollingThread: node oradb2 (2) is impending reconfig, flag 1037, misstime30790[   CSSD]2014-03-17 18:50:02.808 >TRACE:clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)[   CSSD]2014-03-17 18:50:17.808 >WARNING: clssnmPollingThread:node oradb2 (2) at 75 2.317786e-310artbeat fatal, eviction in 14.210 seconds[   CSSD]2014-03-17 18:50:26.815 >WARNING: clssnmPollingThread:node oradb2 (2) at 90 2.317786e-310artbeat fatal, eviction in 5.200 seconds[   CSSD]2014-03-17 18:50:27.807 >WARNING: clssnmPollingThread:node oradb2 (2) at 90 2.317786e-310artbeat fatal, eviction in 4.210 seconds[   CSSD]2014-03-17 18:50:28.809 >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 3.210 seconds[   CSSD]2014-03-17 18:50:29.811 >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 2.210 seconds[   CSSD]2014-03-17 18:50:30.813 >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 1.210 seconds[   CSSD]2014-03-17 18:50:31.815 >WARNING: clssnmPollingThread:node oradb2 (2) at 90 1.482197e-323artbeat fatal, eviction in 0.210 seconds[   CSSD]2014-03-17 18:50:32.028 >TRACE:clssnmPollingThread: Eviction started for node oradb2 (2), flags 0x040d, state3, wt4c 0[   CSSD]2014-03-17 18:50:32.028 >TRACE:   clssnmDoSyncUpdate:Initiating sync 2[   CSSD]2014-03-17 18:50:32.028 >TRACE:clssnmDoSyncUpdate: diskTimeout set to (57000)ms[   CSSD]2014-03-17 18:50:32.028 >TRACE:clssnmSetupAckWait: Ack message type (11)[   CSSD]2014-03-17 18:50:32.028 >TRACE:clssnmSetupAckWait: node(1) is ALIVE[   CSSD]2014-03-17 18:50:32.028 >TRACE:clssnmSendSync: syncSeqNo(2)[   CSSD]2014-03-17 18:50:32.028 >TRACE:clssnmWaitForAcks: Ack message type(11), ackCount(1)[   CSSD]2014-03-17 18:50:32.028 >TRACE:clssnmHandleSync: diskTimeout set to (57000)ms[   CSSD]2014-03-17 18:50:32.028 >TRACE:clssnmHandleSync: Acknowledging sync: src srcName seq sync[   CSSD]2014-03-17 18:50:32.029 >TRACE:clssnmWaitForAcks: done, msg type(11)[   CSSD]2014-03-17 18:50:32.029 >USER:   NMEVENT_SUSPEND [   CSSD]2014-03-17 18:50:32.029 >TRACE:clssnmDoSyncUpdate: Terminating node 2, oradb2, misstime(60010) state(5)[   CSSD]2014-03-17 18:50:32.029 >TRACE:clssnmSetupAckWait: Ack message type (13)[   CSSD]2014-03-17 18:50:32.029 >TRACE:clssnmSetupAckWait: node(1) is ACTIVE[   CSSD]2014-03-17 18:50:32.029 >TRACE:clssnmWaitForAcks: Ack message type(13), ackCount(1)[   CSSD]2014-03-17 18:50:32.029 >TRACE:clssnmSendVoteInfo: node(1) syncSeqNo(2)[   CSSD]2014-03-17 18:50:32.029 >TRACE:   clssnmWaitForAcks:done, msg type(13)[   CSSD]2014-03-17 18:50:32.029 >TRACE:clssnmCheckDskInfo: Checking disk info...[   CSSD]2014-03-17 18:50:32.029 >TRACE:   clssnmEvict:Start[   CSSD]2014-03-17 18:50:32.029 >TRACE:   clssnmEvict:Evicting node 2, oradb2, birth 1, death 2, impendingrcfg 1, stateflags 0x40d
监控显示该节点在故障时间点load 非常高得出初步结论:因为节点2 负载高,导致rac 表决磁盘io超时,从而引起节点2主机重启
因为机器负载高,是因为节点1上对一个核心表大量select,而开发有在节点2上对该表执行大量插入操作导致
对于rac环境,建议对操作频繁表,建议在一个节点上进行维护
页: [1]
查看完整版本: RAC 由于负载高,节点被踢