ORACLE SOS-故障诊断-ORACLE运维管理-11.2 dataguard mrp进程不应用日志

xifenfei 发表于 2015-3-31 23:55:29

11.2 dataguard mrp进程不应用日志

有朋友反映dg库问题：相关日志

千色 23:25:56

就是mrp进程无缘无故的不应用日志了

千色 23:31:19

使用命令关闭mrp进程的时候hang在那

千色 23:31:49

只有把mrp进程从系统上把ospid给kill了再启动mrp 才可以

千色 23:35:26

版本：11.2.0.4.2
平台：aix 6.1
实例： rac 到rac的active data guard

xifenfei 发表于 2015-3-31 23:56:33

分析日志内容
alert日志
Tue Mar 31 20:06:39 2015
alter database recover managed standby database cancel
Tue Mar 31 20:06:52 2015
RFS: Selected log 11 for thread 2 sequence 32058 dbid -1936265204 branch 840710863
Tue Mar 31 20:06:54 2015
Archived Log entry 58059 added for thread 2 sequence 32057 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:08:56 2015

***********************************************************************

Fatal NI connect error 12170.

VERSION INFORMATION:
   TNS for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
   TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
   Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
Time: 31-MAR-2015 20:08:56
Tracing not turned on.
Tns error struct:
ns main err code: 12535

TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 78
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.213.14.161)(PORT=54784))
Tue Mar 31 20:10:07 2015
System State dumped to trace file /u01/oracle/diag/rdbms/xff_p780/xff1/trace/xff1_ora_5046602.trc
Tue Mar 31 20:10:25 2015
RFS: Selected log 10 for thread 2 sequence 32059 dbid -1936265204 branch 840710863
Tue Mar 31 20:10:27 2015
RFS: Selected log 7 for thread 1 sequence 27134 dbid -1936265204 branch 840710863
Tue Mar 31 20:10:28 2015
Archived Log entry 58060 added for thread 1 sequence 27133 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:10:30 2015
Archived Log entry 58061 added for thread 2 sequence 32058 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:10:56 2015
RFS: Selected log 11 for thread 2 sequence 32060 dbid -1936265204 branch 840710863
Tue Mar 31 20:10:57 2015
Archived Log entry 58062 added for thread 2 sequence 32059 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:11:12 2015
RFS: Selected log 10 for thread 2 sequence 32061 dbid -1936265204 branch 840710863
Tue Mar 31 20:11:13 2015
Archived Log entry 58063 added for thread 2 sequence 32060 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:11:30 2015
RFS: Selected log 11 for thread 2 sequence 32062 dbid -1936265204 branch 840710863
Tue Mar 31 20:11:31 2015
Archived Log entry 58064 added for thread 2 sequence 32061 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:11:31 2015
RFS: Selected log 8 for thread 1 sequence 27135 dbid -1936265204 branch 840710863
Tue Mar 31 20:11:32 2015
Archived Log entry 58065 added for thread 1 sequence 27134 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:11:45 2015
System State dumped to trace file /u01/oracle/diag/rdbms/xff_p780/xff1/trace/xff1_ora_7209498.trc
Tue Mar 31 20:12:04 2015
RFS: Selected log 10 for thread 2 sequence 32063 dbid -1936265204 branch 840710863
Tue Mar 31 20:12:06 2015
Archived Log entry 58066 added for thread 2 sequence 32062 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:12:16 2015
ORA-1013 signalled during: alter database recover managed standby database cancel...
Tue Mar 31 20:12:18 2015
RFS: Selected log 11 for thread 2 sequence 32064 dbid -1936265204 branch 840710863
Tue Mar 31 20:12:19 2015
Archived Log entry 58067 added for thread 2 sequence 32063 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:12:35 2015
RFS: Selected log 10 for thread 2 sequence 32065 dbid -1936265204 branch 840710863
Tue Mar 31 20:12:37 2015
Archived Log entry 58068 added for thread 2 sequence 32064 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:12:40 2015
RFS: Selected log 7 for thread 1 sequence 27136 dbid -1936265204 branch 840710863
Tue Mar 31 20:12:40 2015
Archived Log entry 58069 added for thread 1 sequence 27135 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:12:53 2015
Recovery coordinator died, shutting down parallel recovery
Tue Mar 31 20:12:53 2015
MRP0: Background Media Recovery terminated with error 448
Errors in file /u01/oracle/diag/rdbms/xff_p780/xff1/trace/xff1_pr00_5505150.trc:
ORA-00448: normal completion of background process
Tue Mar 31 20:12:53 2015
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 12837635141298

10046日志
PARSING IN CURSOR #4573847864 len=54 dep=0 uid=0 oct=35 lid=0 tim=40803842779657 hv=1340289429 ad='7000103f159fe70' sqlid='g2d5pw97y6ccp'
alter database recover managed standby database cancel
END OF STMT
PARSE #4573847864:c=690,e=1178,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=40803842779656
WAIT #4573847864: nam='Disk file operations I/O' ela= 1329 FileOperation=2 fileno=0 filetype=1 obj#=-1 tim=40803842787228
WAIT #4573847864: nam='Disk file operations I/O' ela= 3 FileOperation=2 fileno=1 filetype=1 obj#=-1 tim=40803842787302
WAIT #4573847864: nam='Disk file operations I/O' ela= 49 FileOperation=2 fileno=0 filetype=15 obj#=-1 tim=40803842789536
WAIT #4573847864: nam='control file sequential read' ela= 17827 file#=0 block#=1 blocks=1 obj#=-1 tim=40803842807490
WAIT #4573847864: nam='Disk file operations I/O' ela= 61 FileOperation=2 fileno=0 filetype=15 obj#=-1 tim=40803842807824
WAIT #4573847864: nam='control file sequential read' ela= 30395 file#=0 block#=40 blocks=1 obj#=-1 tim=40803842838252
WAIT #4573847864: nam='control file sequential read' ela= 6924 file#=0 block#=42 blocks=1 obj#=-1 tim=40803842845354
Managed Recovery: Cancel posted.

*** 2015-03-31 20:06:40.206
WAIT #4573847864: nam='MRP inactivation' ela= 1000037 p1=0 p2=0 p3=0 obj#=-1 tim=40803843845510

*** 2015-03-31 20:06:41.206
WAIT #4573847864: nam='MRP inactivation' ela= 1000016 p1=0 p2=0 p3=0 obj#=-1 tim=40803844845647

*** 2015-03-31 20:06:42.207
WAIT #4573847864: nam='MRP inactivation' ela= 1000016 p1=0 p2=0 p3=0 obj#=-1 tim=40803845845816

*** 2015-03-31 20:06:43.207
WAIT #4573847864: nam='MRP inactivation' ela= 1000015 p1=0 p2=0 p3=0 obj#=-1 tim=40803846845941

*** 2015-03-31 20:06:44.207
WAIT #4573847864: nam='MRP inactivation' ela= 1000021 p1=0 p2=0 p3=0 obj#=-1 tim=40803847846122

*** 2015-03-31 20:06:45.207
WAIT #4573847864: nam='MRP inactivation' ela= 1000016 p1=0 p2=0 p3=0 obj#=-1 tim=40803848846278

*** 2015-03-31 20:06:46.208
WAIT #4573847864: nam='MRP inactivation' ela= 1000721 p1=0 p2=0 p3=0 obj#=-1 tim=40803849847160

*** 2015-03-31 20:06:47.208
WAIT #4573847864: nam='MRP inactivation' ela= 1000020 p1=0 p2=0 p3=0 obj#=-1 tim=40803850847294

*** 2015-03-31 20:06:48.208
WAIT #4573847864: nam='MRP inactivation' ela= 1000019 p1=0 p2=0 p3=0 obj#=-1 tim=40803851847459

*** 2015-03-31 20:06:49.208
WAIT #4573847864: nam='MRP inactivation' ela= 1000021 p1=0 p2=0 p3=0 obj#=-1 tim=40803852847608

*** 2015-03-31 20:06:50.209
WAIT #4573847864: nam='MRP inactivation' ela= 1000015 p1=0 p2=0 p3=0 obj#=-1 tim=40803853847794

*** 2015-03-31 20:06:51.209
WAIT #4573847864: nam='MRP inactivation' ela= 1000016 p1=0 p2=0 p3=0 obj#=-1 tim=40803854847926

*** 2015-03-31 20:06:52.209
WAIT #4573847864: nam='MRP inactivation' ela= 1000017 p1=0 p2=0 p3=0 obj#=-1 tim=40803855848053

*** 2015-03-31 20:06:53.209
WAIT #4573847864: nam='MRP inactivation' ela= 1000017 p1=0 p2=0 p3=0 obj#=-1 tim=40803856848259

xifenfei 发表于 2015-3-31 23:57:44

查询mos 发现很可能是bug 15892456
https://myaccess.oraclevpn.com/+CSCO+0175676763663A2F2F7A6266727A632E68662E62656E7079722E70627A++/epmos/adf/images/t.gif
Hdr: 15892456 11.2.0.2.5 RDBMS 11.2.0.2.5 DATAGUARD_PSBY PRODID-5 PORTID-226
Abstract: MRP IS NOT APPLYING REDO (MRP0, PR00 AND PRXX SEEMS IDLE FOR A LONG TIME)

*** BPADMWW 11/19/12 10:06 am *** (ADD: Impact/Symptom->OTHER - FEATURE UNUS...)
*** CGREY 11/19/12 10:06 am ***
@ TAR:
@ ----
@ SR 3-6327164354

BUG TYPE CHOSEN
===============
Code

Component: RDBMS
================
DETAILED PROBLEM DESCRIPTION
============================
11.2.0.5 primary 3 node RAC and standby 2 node RAC

1. edit database provprr set state = 'APPLY-OFF' from DGMGRL hangs for
many hours

2. after instance 1 of the primary crashes the standby does not apply logs
already shipped from the primary.

Oct 16 MRP hang customer turned on log_archive_trace for 3 minutes and it
did not have output.
System state dump happened in the PROVPRR2_mrp0_10613.trc

@ EXPLAIN WHY YOU BELIEVE THIS TO BE A PRODUCT DEFECT
@ ===================================================
@ MRP should not stop applying and hang
@
DIAGNOSTIC ANALYSIS
===================
alert_PROVPRD1_primary.logprimary instance 1 alert log show crash.
Current sequences are 31 and 32

Thu Oct 11 14:08:18 2012
LGWR: Standby redo logfile selected to archive thread 1 sequence 32
LGWR: Standby redo logfile selected for thread 1 sequence 32 for destination
LOG_ARCHIVE_DEST_1
Thread 1 advanced to log sequence 32 (LGWR switch)
Current log# 4 seq# 32 mem# 0: +DATA_RAID1_PROVPRD/provprd/redo04a.log
Current log# 4 seq# 32 mem# 1: +DATA_RAID1_PROVPRD/provprd/redo04b.log
Thu Oct 11 14:08:36 2012
Archived Log entry 2188 added for thread 1 sequence 31 ID 0x302ac18 dest 2:
Thu Oct 11 14:09:47 2012
ARC4: Standby redo logfile selected for thread 1 sequence 31 for destination
LOG_ARCHIVE_DEST_1
Thu Oct 11 14:10:09 2012
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_1 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_1 network reconnect abandoned
Error 16198 for archive log file 4 to 'provprr'
Thu Oct 11 14:11:24 2012
...
Thu Oct 11 14:19:16 2012
Timed out trying to start process m001.
Thu Oct 11 14:19:19 2012
Errors in file
/app/oracle/diag/rdbms/provprd/PROVPRD1/trace/PROVPRD1_arc4_27890.trc
(incident=5045993):
ORA-240: control file enqueue held for more than 120 seconds
Incident details in:
/app/oracle/diag/rdbms/provprd/PROVPRD1/incident/incdir_5045993/PROVPRD1_arc4
_27890_i5045993.trc
Thu Oct 11 14:19:22 2012
...
opiodr aborting process unknown ospid (30237) as a result of ORA-609
Thu Oct 11 14:20:26 2012
PMON failed to acquire latch, see PMON dump
Thu Oct 11 14:21:29 2012
PMON failed to acquire latch, see PMON dump
Thu Oct 11 14:22:21 2012
...
opiodr aborting process unknown ospid (30316) as a result of ORA-609
Thu Oct 11 14:18:51 2012
Errors in file
/app/oracle/diag/rdbms/provprd/PROVPRD1/trace/PROVPRD1_arc0_27871.trc
(incident=5045945):
ORA-240: control file enqueue held for more than 120 seconds
Incident details in:
/app/oracle/diag/rdbms/provprd/PROVPRD1/incident/incdir_5045945/PROVPRD1_arc0
_27871_i5045945.trc
Thu Oct 11 14:40:34 2012
Starting ORACLE instance (normal)

alert_PROVPRR2.log Standby instance 2 is the apply instance. Looks like MRP
was hung and finally not applying

Thu Oct 11 00:18:01 2012
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASETHROUGH ALL
SWITCHOVER DISCONNECTUSING CURRENT LOGFILE
Thu Oct 11 00:18:01 2012
Media Recovery Waiting for thread 2 sequence 17 (in transit)
Recovery of Online Redo Log: Thread 2 Group 18 Seq 17 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby06a.log
Media Recovery Waiting for thread 1 sequence 29 (in transit)
Recovery of Online Redo Log: Thread 1 Group 13 Seq 29 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby01a.log
Media Recovery Waiting for thread 3 sequence 25 (in transit)
Recovery of Online Redo Log: Thread 3 Group 23 Seq 25 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby11a.log
Thu Oct 11 11:44:16 2012
Standby controlfile consistent with primary
RFS: Selected log 14 for thread 1 sequence 30 dbid 32324255 branch
795712034
Thu Oct 11 11:44:16 2012
Media Recovery Waiting for thread 1 sequence 30 (in transit)
Recovery of Online Redo Log: Thread 1 Group 14 Seq 30 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby02a.log
Media Recovery Waiting for thread 3 sequence 26 (in transit)
Recovery of Online Redo Log: Thread 3 Group 24 Seq 26 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby12a.log
Thu Oct 11 11:44:31 2012
Archived Log entry 4717 added for thread 1 sequence 29 ID 0x302ac18 dest 2:
Thu Oct 11 14:08:36 2012
RFS: Assigned to RFS process 27430
RFS: Selected log 25 for thread 3 sequence 27 dbid 32324255 branch
795712034
Thu Oct 11 14:08:51 2012
Media Recovery Log
+ARCH_PROVPRD/provprr/archivelog/2012_10_11/thread_2_seq_17.4933.796399699
Media Recovery Waiting for thread 3 sequence 27 (in transit)
Recovery of Online Redo Log: Thread 3 Group 25 Seq 27 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby13a.log

alert_PROVPRR2.log Standby apply instance 2 not applying for a day or so and
is cancelled and hangs.

Mon Oct 15 17:07:02 2012
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
Mon Oct 15 18:00:50 2012
System State dumped to trace file
/app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_ora_23483.trc
Mon Oct 15 18:10:34 2012
System State dumped to trace file
/app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc
Tue Oct 16 10:21:32 2012
ALTER SYSTEM SET log_archive_trace=8192 SCOPE=BOTH SID='*';
<-------------- ran tracing for about 3 minutes here
Tue Oct 16 10:24:17 2012
ALTER SYSTEM SET log_archive_trace=0 SCOPE=BOTH SID='*';
Tue Oct 16 10:58:25 2012
System State dumped to trace file
/app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc
Tue Oct 16 12:10:43 2012
Errors in file
/app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc:
Tue Oct 16 12:10:44 2012
Dumping diagnostic data in directory=, requested by
(instance=2, osid=10613 (MRP0)), summary=.
Tue Oct 16 13:33:59 2012
Standby controlfile consistent with primary
RFS: Selected log 23 for thread 3 sequence 71 dbid 32324255 branch
795712034
Tue Oct 16 13:34:12 2012
Archived Log entry 4808 added for thread 3 sequence 70 ID 0x302ac18 dest 2:
Tue Oct 16 15:04:02 2012
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
Tue Oct 16 15:05:19 2012
Managed Standby Recovery Canceled (PROVPRR2)
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
Tue Oct 16 15:05:20 2012
Recovery coordinator died, shutting down parallel recovery
                        <--------------- you must have killed it here
Tue Oct 16 15:05:29 2012
ORA-1013 signalled during: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
CANCEL...
Tue Oct 16 15:06:39 2012
Logmerger died, shutting down parallel recovery slaves
Recovery interrupted!
Recovery interrupted!
Tue Oct 16 15:06:43 2012
Reconfiguration started (old inc 1226, new inc 1228)
List of instances:
1 2 3 (myinst: 2)
Global Resource Directory frozen
Communication channels reestablished
Tue Oct 16 15:06:43 2012
* domain 0 valid = 1 according to instance 1
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Tue Oct 16 15:06:43 2012
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Oct 16 15:06:43 2012
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Oct 16 15:06:43 2012
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Oct 16 15:06:43 2012
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Tue Oct 16 15:30:34 2012
ALTER DATABASE RECOVER MANAGED STANDBY DATABASETHROUGH ALL SWITCHOVER
DISCONNECTUSING CURRENT LOGFILE
Attempt to start background Managed Standby Recovery process (PROVPRR2)
Tue Oct 16 15:30:34 2012
....

after the MRP restart it is applying again.

Tue Oct 16 15:30:46 2012
Parallel Media Recovery started with 64 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Tue Oct 16 15:30:47 2012
Media Recovery Log
+ARCH_PROVPRD/provprr/archivelog/2012_10_11/thread_1_seq_32.4942.796401567
Media Recovery Log
+ARCH_PROVPRD/provprr/archivelog/2012_10_11/thread_2_seq_18.4934.796399717
Tue Oct 16 15:30:47 2012
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASETHROUGH ALL
SWITCHOVER DISCONNECTUSING CURRENT LOGFILE
Media Recovery Log
+ARCH_PROVPRD/provprr/archivelog/2012_10_11/thread_3_seq_28.4938.796401515
Thu Oct 11 14:09:05 2012
Media Recovery Waiting for thread 1 sequence 31
Thu Oct 11 14:09:12 2012
Archived Log entry 4722 added for thread 3 sequence 27 ID 0x302ac18 dest 2:
Thu Oct 11 14:11:33 2012
Recovery of Online Redo Log: Thread 1 Group 14 Seq 31 Reading mem 0
Mem# 0: +DID1_PROVPRD/provprr/standby02a.log
Media Recovery Log
+ARCH_PROVPRD/provprr/archivelog/2012_10_11/thread_2_seq_18.4934.796399717
Thu Oct 11 14:11:49 2012
Media Recovery Waiting for thread 3 sequence 28 (in transit)
Recovery of Online Redo Log: Thread 3 Group 23 Seq 28 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby11a.log
Recovery of Online Redo Log: Thread 1 Group 13 Seq 32 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby01a.log
Thu Oct 11 14:34:34 2012
RFS: Possible network disconnect with primary database
Thu Oct 11 14:39:27 2012
RFS: Assigned to RFS process 22717
RFS: Opened log for thread 1 sequence 34 dbid 32324255 branch 795712034
Archived Log entry 4726 added for thread 1 sequence 34 rlc 795712034 ID
0x302ac18 dest 1:
Thu Oct 11 14:43:47 2012
Primary database is in MAXIMUM AVAILABILITY mode
Standby controlfile consistent with primary
Standby controlfile consistent with primary
RFS: Assigned to RFS process 1735
RFS: Selected log 23 for thread 3 sequence 31 dbid 32324255 branch
795712034
Standby controlfile consistent with primary
RFS: Selected log 24 for thread 3 sequence 32 dbid 32324255 branch
795712034
Thu Oct 11 14:43:53 2012
Archived Log entry 4732 added for thread 3 sequence 31 ID 0x302ac18 dest 2:
Thu Oct 11 16:49:06 2012
Standby controlfile consistent with primary
RFS: Selected log 23 for thread 3 sequence 33 dbid 32324255 branch
795712034
Thu Oct 11 16:49:18 2012

WORKAROUND?
===========
Yes

WORKAROUND INFORMATION
======================
Restart MRP. The interruption is not acceptable for HA.

TECHNICAL IMPACT
================
Manual intervention - no notification to the DBA so did not apply for hours

RELATED ISSUES (bugs, forums, RFAs)
===================================
Customer believed it was Bug 13470324 But
doesn't seem to apply

HOW OFTEN DOES THE ISSUE REPRODUCE AT CUSTOMER SITE?
====================================================
Intermittent

DOES THE ISSUE REPRODUCE INTERNALLY?
====================================
No

EXPLAIN WHY THE ISSUE WAS NOT TESTED INTERNALLY.
================================================
Complex setup

IS A TESTCASE AVAILABLE?
========================
No

@ DID THIS WORK PREVIOUSLY?
@ =========================
@ No
@
@ DID YOU TEST THIS ON THE LATEST RELEASE/PATCHSET?
@ =================================================
@ No
@
@ EXPLAIN WHY YOU DID NOT NOT TEST THIS ON THE LATEST RELEASE/PATCHSET.
@ =====================================================================
@ 11.2.0.2.5 only reproduced at customer site
@
@ STACK TRACE/FILE Information
@ ============================
@ alert logs, mrp trace with system state dump, opatch lsinventory -detail
@
Link to IPS Package:
====================
not available

@ P1 Defect?
@ ==========
@ No
@
*** JBARLOW 11/21/12 02:14 am *** (CHG: Sta->10)
*** JBARLOW 11/21/12 02:14 am ***
@ No evidence on bugsftp. Please upload.
*** CGREY 11/21/12 03:05 pm ***
@ Bug Attachment:alert_PROVPRD1.log.zip:alert log primary
@ Bug Attachment:alert_PROVPRD2.log.zip:alert log primary
@ Bug Attachment:alert_PROVPRD3.log.zip:alert log primary
@ Bug Attachment:alert_PROVPRD1.log.zip:standby
@ Bug Attachment:alert_PROVPRD2.log.zip:standby
@ Bug Attachment:alert_PROVPRD3.log.zip:standby
@ Bug Attachment:alert_PROVPRR1.log.zip:primary
@ Bug Attachment:alert_PROVPRR2.log.zip:primary
@ Bug Attachment:alert_PROVPRR3.log.zip:primary
@ Bug Attachment:Broker_Conf.txt:broker
@ Bug Attachment:drcPROVPRD1.zip:primary
@ Bug Attachment:drcPROVPRR1.zip:standby
@ Bug Attachment:lsinventory_Primary.txt:opatch
@ Bug Attachment:lsinventory_Standby.txt:opatch
*** CGREY 11/21/12 03:11 pm *** (CHG: Sta->16)
*** JBARLOW 11/22/12 06:48 am *** (CHG: Sta->10)
*** JBARLOW 11/22/12 06:48 am ***
@ Where is the MRP trace and systemstate? Please upload these to bugftp.
*** CGREY 11/26/12 09:59 am ***
@ Bug Attachment:PROVPRR2_mrp0_10613.trc:state dump in mrp trace file
@ PROVPRR2_mrp0_10613.trc
*** CGREY 11/26/12 09:59 am ***
@ Sorry, attaching system state dump in mrp trace file PROVPRR2_mrp0_10613.trc
@ Thanks
*** CGREY 11/26/12 09:59 am *** (CHG: Sta->16)
@ *** JBARLOW 12/06/12 04:52 am *** (CHG: Support Contact->CGREY->JBARLOW)
*** JBARLOW 12/06/12 04:52 am *** (CHG: Asg->JBARLOW)
*** JBARLOW 12/07/12 03:41 am *** (CHG: Sta->10)
*** JBARLOW 12/07/12 03:41 am ***
@ alert_PROVPRR2.log :
@ .
@Mon Oct 15 17:07:02 2012
@ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
@Mon Oct 15 18:00:50 2012
@System State dumped to trace file
@ /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_ora_23483.trc
@Mon Oct 15 18:10:34 2012
@System State dumped to trace file
@ /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc
@Tue Oct 16 10:21:32 2012
@ALTER SYSTEM SET log_archive_trace=8192 SCOPE=BOTH SID='*';
@Tue Oct 16 10:24:17 2012
@ALTER SYSTEM SET log_archive_trace=0 SCOPE=BOTH SID='*';
@Tue Oct 16 10:58:25 2012
@System State dumped to trace file
@ /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc
@Tue Oct 16 12:10:43 2012
@Errors in file
@ /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc:
@Tue Oct 16 12:10:44 2012
@Dumping diagnostic data in directory=, requested by
@ (instance=2, osid=10613 (MRP0)), summary=.
@Tue Oct 16 13:33:59 2012
@Standby controlfile consistent with primary
@RFS: Selected log 23 for thread 3 sequence 71 dbid 32324255 branch
@ 795712034
@Tue Oct 16 13:34:12 2012
@Archived Log entry 4808 added for thread 3 sequence 70 ID 0x302ac18 dest 2:
@Tue Oct 16 15:04:02 2012
@ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
@Tue Oct 16 15:05:19 2012
@Managed Standby Recovery Canceled (PROVPRR2)
@Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
@ .
@ From this we can see that 3 systemstates were dumped while the 'ALTER
@ DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL' was in progress.
@ These are in tracefiles PROVPRR2_ora_23483.trc and PROVPRR2_mrp0_10613.trc.
@ .
@ However, only PROVPRR2_mrp0_10613.trc has been uploaded to bugsftp. This file
@ is missing the first systemstate because the file was removed and recreated.
@ Furthermore, the single systemstate that we have was a level 2, so we don't
@ have the short stacks.
@ .
@ We do have the wait stacks in the systemstate. MRP (process 55) is in an idle
@ cycle of short waits for 'parallel recovery control message reply' and 'class
@ slave wait'. I would expect to see these waits in an idle MRP process. There
@ is no indication that this is the culprit.
@ .
@ Other sessions in a current long wait are VKTM (this is idle), process 112
@ and process 131.
@ .
@ Process 112:
@ .
@OS pid 32273
@ .
@Current Wait Stack:
@ 0: waiting for 'SQL*Net message from client'
@    driver id=0x54435000, #bytes=0x1, =0x0
@    wait_id=12197 seq_num=12247 snap_id=1
@    wait times: snap=1071 min 26 sec, exc=1071 min 26 sec, total=1071 min 26
@ sec
@    wait times: max=infinite, heur=1071 min 26 sec
@ .
@This wait started on Oct 15 at 17:06:59. I think that this is a user session
@ and is in an idle wait.
@ .
@ Process 131:
@ .
@OS pid 26919
@ .
@This is RFS (unless the OS process table has wrapped)
@ .
@Current Wait Stack:
@ 0: waiting for 'SQL*Net message from client'
@    driver id=0x54435000, #bytes=0x1, =0x0
@    wait_id=247 seq_num=474 snap_id=1
@    wait times: snap=18295 min 41 sec, exc=18295 min 41 sec, total=18295 min
@ 41 sec
@    wait times: max=infinite, heur=18295 min 41 sec
@ .
@This session has been in wait since the RFS process was created. This
@ indicates an idle RFS process.
@ .
@ There is not a great deal extra to be gleaned from the systemstate. All
@ recovery slaves are in idle wait states.
@ .
@ We really needed a series of level 266 systemstates (so we got the short
@ stacks) and/or level 3 hanganalyze dumps.
@ .
@ I would like to see what the other evidence dumped tells us so can you ask
@ the customer to upload the initial systemstate dump (PROVPRR2_ora_23483.trc)
@ and the contents of the diagnostic data dump directory cdmp_20121016121044.
@ .
@ Also has this problem occurred before or since?
*** CGREY 12/07/12 09:38 am ***
@ Asking customer for requested output.
@ Thanks
*** CGREY 12/12/12 01:08 pm ***
@ GTCRLK:3-6327164354:PROVPRR2_ora_23483.trc.zip:Attachment uploaded From
@ Internal Portal
@ GTCRLK:3-6327164354:cdmp_20121016121044.zip:Attachment uploaded From Internal
@ Portal
*** CGREY 12/12/12 01:09 pm *** (CHG: Sta->16)
*** JBARLOW 12/14/12 12:49 am *** (CHG: Sta->10)
*** JBARLOW 12/14/12 12:49 am ***
@ The systemstate in PROVPRR2_ora_23483.trc is essentially the same as that in
@ PROVPRR2_mrp0_10613.trc i.e. the process which issued the command to shutdown
@ standby recovery is waiting for MRP to terminate (waiting on "MRP
@ inactivation"), but MRP just appears to be in an idle state.
@ .
@ The systemstates show that although PR00 is not in a wait state, it's last
@ wait ended 5973 min 45 sec ago i.e 4 days 3 hrs 33 mins. The systemstate was
@ taken at 2012-10-15 18:00:50, so PR00's last wait ended around Oct 11 14:27.
@ .
@ The standby alert logs shows that this is around the time of the last ORL
@ recovery:
@ .
@Thu Oct 11 14:11:49 2012
@Media Recovery Waiting for thread 3 sequence 28 (in transit)
@Recovery of Online Redo Log: Thread 3 Group 23 Seq 28 Reading mem 0
@ Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby11a.log
@Recovery of Online Redo Log: Thread 1 Group 13 Seq 32 Reading mem 0
@ Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby01a.log
@Thu Oct 11 14:34:34 2012
@RFS: Possible network disconnect with primary database
@
@ The next recovered log was after managed recovery was restarted:
@ .
@Tue Oct 16 16:06:51 2012
@Media Recovery Waiting for thread 1 sequence 42 (in transit)
@Recovery of Online Redo Log: Thread 1 Group 13 Seq 42 Reading mem 0
@ Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby01a.log
@ .
@ So it appears that PR00 got itself stuck around Oct 11 14:27.
@ .
@ We would have required multiple stack traces from PR00 to help understand why
@ the process is stuck.
@ .
@ Has this problem happened before and/or since this occurrence? The current
@ evidence is insufficient to make any further progress in RCA.
*** CGREY 12/14/12 08:50 am ***
@ I updated the customer asking if they can reproduce it or if it happens often
@ enough.Can you give an action plan of what we need in case they can
@ reproduce it?
@ Thanks
*** CGREY 12/30/12 08:50 am ***
@ This is not reproducing and customer does not update.
@ Therefore putting in Auto Close Status.
@ Thanks
*** JBARLOW 02/05/13 05:50 am *** (CHG: Sta->31)
*** JBARLOW 02/05/13 05:50 am *** (ADD: Impact/Symptom->HANG - DATABASE INST...)
*** CGREY 02/05/13 07:50 am ***
@ SR is now closed
@ Thanks
*** JPINTO 05/06/13 10:21 pm *** (CHG: Sta->91)

页: [1]

ORACLE SOS's Archiver

11.2 dataguard mrp进程不应用日志