平台:AIX6.1数据库版本:11.2.0.4 RAC
故障描述:
2节点CRS服务abort,但监听和数据库实例正常,且无报错,未发生脑裂,查看CRS的磁盘组被dismounted
处理方法:
手工挂载CRS磁盘组,启动CRS服务
asm实例报错:
2014-03-25 15:32:42.352000 +08:00
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 3 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 4 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 3 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 4 in group 1.
NOTE: process _b000_+asm2 (8716440) initiating offline of disk 0.2901283133 (CRS_0000) with mask 0x7e in group 1
NOTE: process _b000_+asm2 (8716440) initiating offline of disk 1.2901283134 (CRS_0001) with mask 0x7e in group 1
NOTE: process _b000_+asm2 (8716440) initiating offline of disk 2.2901283135 (CRS_0002) with mask 0x7e in group 1
NOTE: process _b000_+asm2 (8716440) initiating offline of disk 3.2901283136 (CRS_0003) with mask 0x7e in group 1
NOTE: process _b000_+asm2 (8716440) initiating offline of disk 4.2901283137 (CRS_0004) with mask 0x7e in group 1
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 21 for pid 33, osid 8716440
ERROR: no read quorum in group: required 3, found 0 disks
NOTE: checking PST for grp 1 done.
NOTE: initiating PST update: grp = 1, dsk = 0/0xacee113d, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 1/0xacee113e, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 2/0xacee113f, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 3/0xacee1140, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 4/0xacee1141, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 22 for pid 33, osid 8716440
ERROR: no read quorum in group: required 3, found 0 disks
NOTE: cache dismounting (not clean) group 1/0x3BBEE1E0 (CRS)
WARNING: Offline for disk CRS_0000 in mode 0x7f failed.
WARNING: Offline for disk CRS_0001 in mode 0x7f failed.
WARNING: Offline for disk CRS_0002 in mode 0x7f failed.
WARNING: Offline for disk CRS_0003 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 22741528, image: oracle@OSS-JH-DB2 (B001)
WARNING: Offline for disk CRS_0004 in mode 0x7f failed.
NOTE: halting all I/Os to diskgroup 1 (CRS)
NOTE: LGWR doing non-clean dismount of group 1 (CRS)
NOTE: LGWR sync ABA=9.63 last written ABA 9.63
NOTE: No asm libraries found in the system
kjbdomdet send to inst 1
detach from dom 1, sending detach message to inst 1
List of instances:
1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)
Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE
520 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0x3BBEE1E0 (CRS)
SQL> alter diskgroup CRS dismount force /* ASM SERVER:1002365408 */
NOTE: cache deleting context for group CRS 1/0x3bbee1e0
GMON dismounting group 1 at 23 for pid 34, osid 22741528
NOTE: Disk CRS_0000 in mode 0x7f marked for de-assignment
NOTE: Disk CRS_0001 in mode 0x7f marked for de-assignment
NOTE: Disk CRS_0002 in mode 0x7f marked for de-assignment
NOTE: Disk CRS_0003 in mode 0x7f marked for de-assignment
NOTE: Disk CRS_0004 in mode 0x7f marked for de-assignment
NOTE: Disk CRS_0005 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 1
2014-03-25 15:32:47.666000 +08:00
ASM Health Checker found 1 new failures
2014-03-25 15:33:12.917000 +08:00
SUCCESS: diskgroup CRS was dismounted
SUCCESS: alter diskgroup CRS dismount force /* ASM SERVER:1002365408 */
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group CRS
NOTE: diskgroup resource ora.CRS.dg is offline
Errors in file /u01/db/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_7143732.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
METALINK建议:ASM Disks Offline When Few Paths In The Storage Is Lost (文档 ID 1581684.1)
If possible,please check with multipath vendor if OS level timeout value can be reduced to atleast 15 seconds or less. If not ,then set parameter " _asm_hbeatiowait " from 15 to 35 secs at ASM level for this kind of issue after consulting oracle support only. 问题: 1、在哪里检查os level timeout value 2、可否修改_asm_hbeatiowait隐含参数,官方推荐仅在oracle支持的情况下修改。 请各位帮忙看看,谢谢!
|