有朋友反馈win环境下rac异常,asm无法正常mount,检查日志发现 [size=1em]Fri Jul 03 03:55:46 2020
[size=1em]Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
[size=1em]ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
[size=1em]ORA-27041: unable to open file
[size=1em]OSD-04002: 无法打开文件
[size=1em]O/S-Error: (OS 2) 系统找不到指定的文件。
[size=1em]Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
[size=1em]ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
[size=1em]ORA-27041: unable to open file
[size=1em]OSD-04002: 无法打开文件
[size=1em]O/S-Error: (OS 2) 系统找不到指定的文件。
[size=1em]WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 267 in group [2.2254399778]
[size=1em]from disk DATA_0000 allocation unit 3502 reason error; if possible, will try another mirror side
[size=1em]Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
[size=1em]ORA-15081: failed to submit an I/O operation to a disk
[size=1em]Fri Jul 03 03:59:46 2020
[size=1em]Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
[size=1em]ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
[size=1em]ORA-27041: unable to open file
[size=1em]OSD-04002: 无法打开文件
[size=1em]O/S-Error: (OS 2) 系统找不到指定的文件。
[size=1em]Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
[size=1em]ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
[size=1em]ORA-27041: unable to open file
[size=1em]OSD-04002: 无法打开文件
[size=1em]O/S-Error: (OS 2) 系统找不到指定的文件。
[size=1em]WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 267 in group [2.2254399778]
[size=1em]from disk DATA_0000 allocation unit 3502 reason error; if possible, will try another mirror side
[size=1em]Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
[size=1em]ORA-15081: failed to submit an I/O operation to a disk
报错信息比较明显是由于无法找到\\.\ORCLDISKDATA1磁盘,因此异常,通过asmtool查看磁盘信息 [size=1em]C:\app\11.2.0\grid>asmtool -list
[size=1em]NTFS \Device\Harddisk0\Partition3 81920M
[size=1em]NTFS \Device\Harddisk0\Partition4 200000M
[size=1em]NTFS \Device\Harddisk0\Partition5 4293849M
[size=1em] \Device\Harddisk1\Partition2 4062M
[size=1em] \Device\Harddisk2\Partition2 2097022M
[size=1em]ORCLDISKFRA0 \Device\Harddisk3\Partition2 511870M
[size=1em]C:\Users\Administrator>kfed read F:\temp\disk3\1\disk2.dd
[size=1em]kfbh.endian: 0 ; 0x000: 0x00
[size=1em]kfbh.hard: 0 ; 0x001: 0x00
[size=1em]kfbh.type: 0 ; 0x002: KFBTYP_INVALID
[size=1em]kfbh.datfmt: 0 ; 0x003: 0x00
[size=1em]kfbh.block.blk: 0 ; 0x004: blk=0
[size=1em]kfbh.block.obj: 0 ; 0x008: file=0
[size=1em]kfbh.check: 0 ; 0x00c: 0x00000000
[size=1em]kfbh.fcn.base: 0 ; 0x010: 0x00000000
[size=1em]kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
[size=1em]kfbh.spare1: 0 ; 0x018: 0x00000000
[size=1em]kfbh.spare2: 0 ; 0x01c: 0x00000000
[size=1em]006B38C00 00000000 00000000 00000000 00000000 [................]
[size=1em] Repeat 255 times
[size=1em]KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
[size=1em]C:\Users\Administrator>kfed read F:\temp\disk3\1\disk2.dd blkn=2
[size=1em]kfbh.endian: 1 ; 0x000: 0x01
[size=1em]kfbh.hard: 130 ; 0x001: 0x82
[size=1em]kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL
[size=1em]kfbh.datfmt: 2 ; 0x003: 0x02
[size=1em]kfbh.block.blk: 2 ; 0x004: blk=2
[size=1em]kfbh.block.obj: 2147483648 ; 0x008: disk=0
[size=1em]kfbh.check: 2349305287 ; 0x00c: 0x8c078dc7
[size=1em]kfbh.fcn.base: 0 ; 0x010: 0x00000000
[size=1em]kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
[size=1em]kfbh.spare1: 0 ; 0x018: 0x00000000
[size=1em]kfbh.spare2: 0 ; 0x01c: 0x00000000
[size=1em]kfdatb.aunum: 0 ; 0x000: 0x00000000
[size=1em]kfdatb.shrink: 448 ; 0x004: 0x01c0
[size=1em]kfdatb.ub2pad: 0 ; 0x006: 0x0000
[size=1em]kfdatb.auinfo[0].link.next: 8 ; 0x008: 0x0008
[size=1em]kfdatb.auinfo[0].link.prev: 8 ; 0x00a: 0x0008
[size=1em]kfdatb.auinfo[1].link.next: 12 ; 0x00c: 0x000c
[size=1em]kfdatb.auinfo[1].link.prev: 12 ; 0x00e: 0x000c
[size=1em]kfdatb.auinfo[2].link.next: 456 ; 0x010: 0x01c8
[size=1em]kfdatb.auinfo[2].link.prev: 456 ; 0x012: 0x01c8
[size=1em]kfdatb.auinfo[3].link.next: 488 ; 0x014: 0x01e8
[size=1em]kfdatb.auinfo[3].link.prev: 488 ; 0x016: 0x01e8
[size=1em]kfdatb.auinfo[4].link.next: 24 ; 0x018: 0x0018
[size=1em]kfdatb.auinfo[4].link.prev: 24 ; 0x01a: 0x0018
[size=1em]kfdatb.auinfo[5].link.next: 28 ; 0x01c: 0x001c
[size=1em]kfdatb.auinfo[5].link.prev: 28 ; 0x01e: 0x001c
[size=1em]kfdatb.auinfo[6].link.next: 552 ; 0x020: 0x0228
[size=1em]kfdatb.auinfo[6].link.prev: 3112 ; 0x022: 0x0c28
[size=1em]kfdatb.spare: 0 ; 0x024: 0x00000000
[size=1em]kfdate[0].discriminator: 1 ; 0x028: 0x00000001
[size=1em]kfdate[0].allo.lo: 0 ; 0x028: XNUM=0x0
[size=1em]kfdate[0].allo.hi: 8388608 ; 0x02c: V=1 I=0 H=0 FNUM=0x0
[size=1em]kfdate[1].discriminator: 1 ; 0x030: 0x00000001
[size=1em]kfdate[1].allo.lo: 0 ; 0x030: XNUM=0x0
[size=1em]kfdate[1].allo.hi: 8388608 ; 0x034: V=1 I=0 H=0 FNUM=0x0
[size=1em]kfdate[2].discriminator: 1 ; 0x038: 0x00000001
[size=1em]kfdate[2].allo.lo: 0 ; 0x038: XNUM=0x0
[size=1em]kfdate[2].allo.hi: 8388609 ; 0x03c: V=1 I=0 H=0 FNUM=0x1
明显的发现ORCLDISKDATA1磁盘丢失,通过对磁盘dd到本地然后进行分析发现,asm disk header损坏 fra磁盘虽然磁盘asm label信息存在,但是其他信息依旧损坏,但是也只是磁盘头信息损坏
通过现场分析,基本上可以确定是由于某种原因导致win asm 的磁盘的所有磁盘头都损坏(两个磁盘头被置空,另外一个磁盘头基本上损坏),基于原因未知
基于客户现场的情况,以及他们有前一天的rman备份,而且客户有保障现场(进一步故障原因分析)的需求,未在现场环境进行恢复,而是在不对现场环境做任何修改的情况下,直接恢复fra里面的redo和归档日志,进而结合备份异地实现数据库恢复,实现数据0丢失,又不破坏现场的效果
以前遇到过类似我其他操作系统平台中asm disk header异常的case:
asm磁盘分区丢失恢复
pvid=yes导致asm无法mount
asm磁盘头全部损坏数据0丢失恢复
分区无法识别导致asm diskgroup无法mount
asm disk误设置pvid导致asm diskgroup无法mount恢复
|