11.2 dataguard mrp进程不应用日志

xifenfei · 发表于 2015-3-31 23:55:29

有朋友反映dg库问题：相关日志

千色 23:25:56

就是mrp进程无缘无故的不应用日志了

千色 23:31:19

使用命令关闭mrp进程的时候hang在那

千色 23:31:49

只有把mrp进程从系统上把ospid给kill了再启动mrp 才可以

千色 23:35:26

版本：11.2.0.4.2
平台：aix 6.1
实例： rac 到rac的active data guard

xifenfei · 发表于 2015-3-31 23:56:33

分析日志内容
alert日志
Tue Mar 31 20:06:39 2015
alter database recover managed standby database cancel
Tue Mar 31 20:06:52 2015
RFS[11]: Selected log 11 for thread 2 sequence 32058 dbid -1936265204 branch 840710863
Tue Mar 31 20:06:54 2015
Archived Log entry 58059 added for thread 2 sequence 32057 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:08:56 2015

***********************************************************************

Fatal NI connect error 12170.

  VERSION INFORMATION:
      TNS for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
      TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
      Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
  Time: 31-MAR-2015 20:08:56
  Tracing not turned on.
  Tns error struct:
ns main err code: 12535

TNS-12535: TNS

peration timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 78
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.213.14.161)(PORT=54784))
Tue Mar 31 20:10:07 2015
System State dumped to trace file /u01/oracle/diag/rdbms/xff_p780/xff1/trace/xff1_ora_5046602.trc
Tue Mar 31 20:10:25 2015
RFS[11]: Selected log 10 for thread 2 sequence 32059 dbid -1936265204 branch 840710863
Tue Mar 31 20:10:27 2015
RFS[6]: Selected log 7 for thread 1 sequence 27134 dbid -1936265204 branch 840710863
Tue Mar 31 20:10:28 2015
Archived Log entry 58060 added for thread 1 sequence 27133 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:10:30 2015
Archived Log entry 58061 added for thread 2 sequence 32058 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:10:56 2015
RFS[11]: Selected log 11 for thread 2 sequence 32060 dbid -1936265204 branch 840710863
Tue Mar 31 20:10:57 2015
Archived Log entry 58062 added for thread 2 sequence 32059 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:11:12 2015
RFS[11]: Selected log 10 for thread 2 sequence 32061 dbid -1936265204 branch 840710863
Tue Mar 31 20:11:13 2015
Archived Log entry 58063 added for thread 2 sequence 32060 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:11:30 2015
RFS[11]: Selected log 11 for thread 2 sequence 32062 dbid -1936265204 branch 840710863
Tue Mar 31 20:11:31 2015
Archived Log entry 58064 added for thread 2 sequence 32061 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:11:31 2015
RFS[6]: Selected log 8 for thread 1 sequence 27135 dbid -1936265204 branch 840710863
Tue Mar 31 20:11:32 2015
Archived Log entry 58065 added for thread 1 sequence 27134 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:11:45 2015
System State dumped to trace file /u01/oracle/diag/rdbms/xff_p780/xff1/trace/xff1_ora_7209498.trc
Tue Mar 31 20:12:04 2015
RFS[11]: Selected log 10 for thread 2 sequence 32063 dbid -1936265204 branch 840710863
Tue Mar 31 20:12:06 2015
Archived Log entry 58066 added for thread 2 sequence 32062 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:12:16 2015
ORA-1013 signalled during: alter database recover managed standby database cancel...
Tue Mar 31 20:12:18 2015
RFS[11]: Selected log 11 for thread 2 sequence 32064 dbid -1936265204 branch 840710863
Tue Mar 31 20:12:19 2015
Archived Log entry 58067 added for thread 2 sequence 32063 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:12:35 2015
RFS[11]: Selected log 10 for thread 2 sequence 32065 dbid -1936265204 branch 840710863
Tue Mar 31 20:12:37 2015
Archived Log entry 58068 added for thread 2 sequence 32064 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:12:40 2015
RFS[6]: Selected log 7 for thread 1 sequence 27136 dbid -1936265204 branch 840710863
Tue Mar 31 20:12:40 2015
Archived Log entry 58069 added for thread 1 sequence 27135 ID 0xffffffff8db108ef dest 1:
Tue Mar 31 20:12:53 2015
Recovery coordinator died, shutting down parallel recovery
Tue Mar 31 20:12:53 2015
MRP0: Background Media Recovery terminated with error 448
Errors in file /u01/oracle/diag/rdbms/xff_p780/xff1/trace/xff1_pr00_5505150.trc:
ORA-00448: normal completion of background process
Tue Mar 31 20:12:53 2015
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 12837635141298

10046日志
PARSING IN CURSOR #4573847864 len=54 dep=0 uid=0 oct=35 lid=0 tim=40803842779657 hv=1340289429 ad='7000103f159fe70' sqlid='g2d5pw97y6ccp'
alter database recover managed standby database cancel
END OF STMT
PARSE #4573847864:c=690,e=1178,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=40803842779656
WAIT #4573847864: nam='Disk file operations I/O' ela= 1329 FileOperation=2 fileno=0 filetype=1 obj#=-1 tim=40803842787228
WAIT #4573847864: nam='Disk file operations I/O' ela= 3 FileOperation=2 fileno=1 filetype=1 obj#=-1 tim=40803842787302
WAIT #4573847864: nam='Disk file operations I/O' ela= 49 FileOperation=2 fileno=0 filetype=15 obj#=-1 tim=40803842789536
WAIT #4573847864: nam='control file sequential read' ela= 17827 file#=0 block#=1 blocks=1 obj#=-1 tim=40803842807490
WAIT #4573847864: nam='Disk file operations I/O' ela= 61 FileOperation=2 fileno=0 filetype=15 obj#=-1 tim=40803842807824
WAIT #4573847864: nam='control file sequential read' ela= 30395 file#=0 block#=40 blocks=1 obj#=-1 tim=40803842838252
WAIT #4573847864: nam='control file sequential read' ela= 6924 file#=0 block#=42 blocks=1 obj#=-1 tim=40803842845354
Managed Recovery: Cancel posted.

*** 2015-03-31 20:06:40.206
WAIT #4573847864: nam='MRP inactivation' ela= 1000037 p1=0 p2=0 p3=0 obj#=-1 tim=40803843845510

*** 2015-03-31 20:06:41.206
WAIT #4573847864: nam='MRP inactivation' ela= 1000016 p1=0 p2=0 p3=0 obj#=-1 tim=40803844845647

*** 2015-03-31 20:06:42.207
WAIT #4573847864: nam='MRP inactivation' ela= 1000016 p1=0 p2=0 p3=0 obj#=-1 tim=40803845845816

*** 2015-03-31 20:06:43.207
WAIT #4573847864: nam='MRP inactivation' ela= 1000015 p1=0 p2=0 p3=0 obj#=-1 tim=40803846845941

*** 2015-03-31 20:06:44.207
WAIT #4573847864: nam='MRP inactivation' ela= 1000021 p1=0 p2=0 p3=0 obj#=-1 tim=40803847846122

*** 2015-03-31 20:06:45.207
WAIT #4573847864: nam='MRP inactivation' ela= 1000016 p1=0 p2=0 p3=0 obj#=-1 tim=40803848846278

*** 2015-03-31 20:06:46.208
WAIT #4573847864: nam='MRP inactivation' ela= 1000721 p1=0 p2=0 p3=0 obj#=-1 tim=40803849847160

*** 2015-03-31 20:06:47.208
WAIT #4573847864: nam='MRP inactivation' ela= 1000020 p1=0 p2=0 p3=0 obj#=-1 tim=40803850847294

*** 2015-03-31 20:06:48.208
WAIT #4573847864: nam='MRP inactivation' ela= 1000019 p1=0 p2=0 p3=0 obj#=-1 tim=40803851847459

*** 2015-03-31 20:06:49.208
WAIT #4573847864: nam='MRP inactivation' ela= 1000021 p1=0 p2=0 p3=0 obj#=-1 tim=40803852847608

*** 2015-03-31 20:06:50.209
WAIT #4573847864: nam='MRP inactivation' ela= 1000015 p1=0 p2=0 p3=0 obj#=-1 tim=40803853847794

*** 2015-03-31 20:06:51.209
WAIT #4573847864: nam='MRP inactivation' ela= 1000016 p1=0 p2=0 p3=0 obj#=-1 tim=40803854847926

*** 2015-03-31 20:06:52.209
WAIT #4573847864: nam='MRP inactivation' ela= 1000017 p1=0 p2=0 p3=0 obj#=-1 tim=40803855848053

*** 2015-03-31 20:06:53.209
WAIT #4573847864: nam='MRP inactivation' ela= 1000017 p1=0 p2=0 p3=0 obj#=-1 tim=40803856848259

xifenfei · 发表于 2015-3-31 23:57:44

查询mos 发现很可能是bug 15892456

Hdr: 15892456 11.2.0.2.5 RDBMS 11.2.0.2.5 DATAGUARD_PSBY PRODID-5 PORTID-226
Abstract: MRP IS NOT APPLYING REDO (MRP0, PR00 AND PRXX SEEMS IDLE FOR A LONG TIME)

*** BPADMWW 11/19/12 10:06 am *** (ADD: Impact/Symptom->OTHER - FEATURE UNUS...)
*** CGREY 11/19/12 10:06 am ***
@ TAR:
@ ----
@ SR 3-6327164354

  BUG TYPE CHOSEN
  ===============
  Code

  Component: RDBMS
  ================
  DETAILED PROBLEM DESCRIPTION
  ============================
  11.2.0.5 primary 3 node RAC and standby 2 node RAC

  1. edit database provprr set state = 'APPLY-OFF' from DGMGRL hangs for
  many hours

  2. after instance 1 of the primary crashes the standby does not apply logs
  already shipped from the primary.

  Oct 16 MRP hang customer turned on log_archive_trace for 3 minutes and it
  did not have output.
  System state dump happened in the PROVPRR2_mrp0_10613.trc

@ EXPLAIN WHY YOU BELIEVE THIS TO BE A PRODUCT DEFECT
@ ===================================================
@ MRP should not stop applying and hang
@
  DIAGNOSTIC ANALYSIS
  ===================
  alert_PROVPRD1_primary.log  primary instance 1 alert log show crash.
  Current sequences are 31 and 32


  Thu Oct 11 14:08:18 2012
  LGWR: Standby redo logfile selected to archive thread 1 sequence 32
  LGWR: Standby redo logfile selected for thread 1 sequence 32 for destination
  LOG_ARCHIVE_DEST_1
  Thread 1 advanced to log sequence 32 (LGWR switch)
Current log# 4 seq# 32 mem# 0: +DATA_RAID1_PROVPRD/provprd/redo04a.log
Current log# 4 seq# 32 mem# 1: +DATA_RAID1_PROVPRD/provprd/redo04b.log
  Thu Oct 11 14:08:36 2012
  Archived Log entry 2188 added for thread 1 sequence 31 ID 0x302ac18 dest 2:
  Thu Oct 11 14:09:47 2012
  ARC4: Standby redo logfile selected for thread 1 sequence 31 for destination
  LOG_ARCHIVE_DEST_1
  Thu Oct 11 14:10:09 2012
  ORA-16198: LGWR received timedout error from KSR
  LGWR: Attempting destination LOG_ARCHIVE_DEST_1 network reconnect (16198)
  LGWR: Destination LOG_ARCHIVE_DEST_1 network reconnect abandoned
  Error 16198 for archive log file 4 to 'provprr'
  Thu Oct 11 14:11:24 2012
  ...
  Thu Oct 11 14:19:16 2012
  Timed out trying to start process m001.
  Thu Oct 11 14:19:19 2012
  Errors in file
  /app/oracle/diag/rdbms/provprd/PROVPRD1/trace/PROVPRD1_arc4_27890.trc
  (incident=5045993):
  ORA-240: control file enqueue held for more than 120 seconds
  Incident details in:
  /app/oracle/diag/rdbms/provprd/PROVPRD1/incident/incdir_5045993/PROVPRD1_arc4
  _27890_i5045993.trc
  Thu Oct 11 14:19:22 2012
  ...
  opiodr aborting process unknown ospid (30237) as a result of ORA-609
  Thu Oct 11 14:20:26 2012
  PMON failed to acquire latch, see PMON dump
  Thu Oct 11 14:21:29 2012
  PMON failed to acquire latch, see PMON dump
  Thu Oct 11 14:22:21 2012
  ...
  opiodr aborting process unknown ospid (30316) as a result of ORA-609
  Thu Oct 11 14:18:51 2012
  Errors in file
  /app/oracle/diag/rdbms/provprd/PROVPRD1/trace/PROVPRD1_arc0_27871.trc
  (incident=5045945):
  ORA-240: control file enqueue held for more than 120 seconds
  Incident details in:
  /app/oracle/diag/rdbms/provprd/PROVPRD1/incident/incdir_5045945/PROVPRD1_arc0
  _27871_i5045945.trc
  Thu Oct 11 14:40:34 2012
  Starting ORACLE instance (normal)



  alert_PROVPRR2.log Standby instance 2 is the apply instance. Looks like MRP
  was hung and finally not applying


  Thu Oct 11 00:18:01 2012
  Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL
  SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
  Thu Oct 11 00:18:01 2012
  Media Recovery Waiting for thread 2 sequence 17 (in transit)
  Recovery of Online Redo Log: Thread 2 Group 18 Seq 17 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby06a.log
  Media Recovery Waiting for thread 1 sequence 29 (in transit)
  Recovery of Online Redo Log: Thread 1 Group 13 Seq 29 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby01a.log
  Media Recovery Waiting for thread 3 sequence 25 (in transit)
  Recovery of Online Redo Log: Thread 3 Group 23 Seq 25 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby11a.log
  Thu Oct 11 11:44:16 2012
  Standby controlfile consistent with primary
  RFS[4]: Selected log 14 for thread 1 sequence 30 dbid 32324255 branch
  795712034
  Thu Oct 11 11:44:16 2012
  Media Recovery Waiting for thread 1 sequence 30 (in transit)
  Recovery of Online Redo Log: Thread 1 Group 14 Seq 30 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby02a.log
  Media Recovery Waiting for thread 3 sequence 26 (in transit)
  Recovery of Online Redo Log: Thread 3 Group 24 Seq 26 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby12a.log
  Thu Oct 11 11:44:31 2012
  Archived Log entry 4717 added for thread 1 sequence 29 ID 0x302ac18 dest 2:
  Thu Oct 11 14:08:36 2012
  RFS[6]: Assigned to RFS process 27430
  RFS[6]: Selected log 25 for thread 3 sequence 27 dbid 32324255 branch
  795712034
  Thu Oct 11 14:08:51 2012
  Media Recovery Log
  +ARCH_PROVPRD/provprr/archivelog/2012_10_11/thread_2_seq_17.4933.796399699
  Media Recovery Waiting for thread 3 sequence 27 (in transit)
  Recovery of Online Redo Log: Thread 3 Group 25 Seq 27 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby13a.log


  alert_PROVPRR2.log Standby apply instance 2 not applying for a day or so and
  is cancelled and hangs.

  Mon Oct 15 17:07:02 2012
  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
  Mon Oct 15 18:00:50 2012
  System State dumped to trace file
  /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_ora_23483.trc
  Mon Oct 15 18:10:34 2012
  System State dumped to trace file
  /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc
  Tue Oct 16 10:21:32 2012
  ALTER SYSTEM SET log_archive_trace=8192 SCOPE=BOTH SID='*';
  <-------------- ran tracing for about 3 minutes here
  Tue Oct 16 10:24:17 2012
  ALTER SYSTEM SET log_archive_trace=0 SCOPE=BOTH SID='*';
  Tue Oct 16 10:58:25 2012
  System State dumped to trace file
  /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc
  Tue Oct 16 12:10:43 2012
  Errors in file
  /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc:
  Tue Oct 16 12:10:44 2012
  Dumping diagnostic data in directory=[cdmp_20121016121044], requested by
  (instance=2, osid=10613 (MRP0)), summary=[abnormal process termination].
  Tue Oct 16 13:33:59 2012
  Standby controlfile consistent with primary
  RFS[8]: Selected log 23 for thread 3 sequence 71 dbid 32324255 branch
  795712034
  Tue Oct 16 13:34:12 2012
  Archived Log entry 4808 added for thread 3 sequence 70 ID 0x302ac18 dest 2:
  Tue Oct 16 15:04:02 2012
  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
  Tue Oct 16 15:05:19 2012
  Managed Standby Recovery Canceled (PROVPRR2)
  Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
  Tue Oct 16 15:05:20 2012
  Recovery coordinator died, shutting down parallel recovery
                        <--------------- you must have killed it here
  Tue Oct 16 15:05:29 2012
  ORA-1013 signalled during: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
  CANCEL...
  Tue Oct 16 15:06:39 2012
  Logmerger died, shutting down parallel recovery slaves
  Recovery interrupted!
  Recovery interrupted!
  Tue Oct 16 15:06:43 2012
  Reconfiguration started (old inc 1226, new inc 1228)
  List of instances:
  1 2 3 (myinst: 2)
  Global Resource Directory frozen
  Communication channels reestablished
  Tue Oct 16 15:06:43 2012
  * domain 0 valid = 1 according to instance 1
  Master broadcasted resource hash value bitmaps
  Non-local Process blocks cleaned out
  Tue Oct 16 15:06:43 2012
  LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  Tue Oct 16 15:06:43 2012
  LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  Tue Oct 16 15:06:43 2012
  LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  Tue Oct 16 15:06:43 2012
  LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  Set master node info
  Submitted all remote-enqueue requests
  Dwn-cvts replayed, VALBLKs dubious
  All grantable enqueues granted
  Submitted all GCS remote-cache requests
  Fix write in gcs resources
  Reconfiguration complete
  Tue Oct 16 15:30:34 2012
  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER
  DISCONNECT  USING CURRENT LOGFILE
  Attempt to start background Managed Standby Recovery process (PROVPRR2)
  Tue Oct 16 15:30:34 2012
  ....

  after the MRP restart it is applying again.

  Tue Oct 16 15:30:46 2012
  Parallel Media Recovery started with 64 slaves
  Waiting for all non-current ORLs to be archived...
  All non-current ORLs have been archived.
  Tue Oct 16 15:30:47 2012
  Media Recovery Log
  +ARCH_PROVPRD/provprr/archivelog/2012_10_11/thread_1_seq_32.4942.796401567
  Media Recovery Log
  +ARCH_PROVPRD/provprr/archivelog/2012_10_11/thread_2_seq_18.4934.796399717
  Tue Oct 16 15:30:47 2012
  Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL
  SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
  Media Recovery Log
  +ARCH_PROVPRD/provprr/archivelog/2012_10_11/thread_3_seq_28.4938.796401515
  Thu Oct 11 14:09:05 2012
  Media Recovery Waiting for thread 1 sequence 31
  Thu Oct 11 14:09:12 2012
  Archived Log entry 4722 added for thread 3 sequence 27 ID 0x302ac18 dest 2:
  Thu Oct 11 14:11:33 2012
  Recovery of Online Redo Log: Thread 1 Group 14 Seq 31 Reading mem 0
Mem# 0: +DID1_PROVPRD/provprr/standby02a.log
  Media Recovery Log
  +ARCH_PROVPRD/provprr/archivelog/2012_10_11/thread_2_seq_18.4934.796399717
  Thu Oct 11 14:11:49 2012
  Media Recovery Waiting for thread 3 sequence 28 (in transit)
  Recovery of Online Redo Log: Thread 3 Group 23 Seq 28 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby11a.log
  Recovery of Online Redo Log: Thread 1 Group 13 Seq 32 Reading mem 0
Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby01a.log
  Thu Oct 11 14:34:34 2012
  RFS[5]: Possible network disconnect with primary database
  Thu Oct 11 14:39:27 2012
  RFS[7]: Assigned to RFS process 22717
  RFS[7]: Opened log for thread 1 sequence 34 dbid 32324255 branch 795712034
  Archived Log entry 4726 added for thread 1 sequence 34 rlc 795712034 ID
  0x302ac18 dest 1:
  Thu Oct 11 14:43:47 2012
  Primary database is in MAXIMUM AVAILABILITY mode
  Standby controlfile consistent with primary
  Standby controlfile consistent with primary
  RFS[8]: Assigned to RFS process 1735
  RFS[8]: Selected log 23 for thread 3 sequence 31 dbid 32324255 branch
  795712034
  Standby controlfile consistent with primary
  RFS[8]: Selected log 24 for thread 3 sequence 32 dbid 32324255 branch
  795712034
  Thu Oct 11 14:43:53 2012
  Archived Log entry 4732 added for thread 3 sequence 31 ID 0x302ac18 dest 2:
  Thu Oct 11 16:49:06 2012
  Standby controlfile consistent with primary
  RFS[8]: Selected log 23 for thread 3 sequence 33 dbid 32324255 branch
  795712034
  Thu Oct 11 16:49:18 2012

  WORKAROUND?
  ===========
  Yes

  WORKAROUND INFORMATION
  ======================
  Restart MRP. The interruption is not acceptable for HA.

  TECHNICAL IMPACT
  ================
  Manual intervention - no notification to the DBA so did not apply for hours

  RELATED ISSUES (bugs, forums, RFAs)
  ===================================
  Customer believed it was Bug 13470324 But
  doesn't seem to apply

  HOW OFTEN DOES THE ISSUE REPRODUCE AT CUSTOMER SITE?
  ====================================================
  Intermittent

  DOES THE ISSUE REPRODUCE INTERNALLY?
  ====================================
  No

  EXPLAIN WHY THE ISSUE WAS NOT TESTED INTERNALLY.
  ================================================
  Complex setup

  IS A TESTCASE AVAILABLE?
  ========================
  No

@ DID THIS WORK PREVIOUSLY?
@ =========================
@ No
@
@ DID YOU TEST THIS ON THE LATEST RELEASE/PATCHSET?
@ =================================================
@ No
@
@ EXPLAIN WHY YOU DID NOT NOT TEST THIS ON THE LATEST RELEASE/PATCHSET.
@ =====================================================================
@ 11.2.0.2.5 only reproduced at customer site
@
@ STACK TRACE/FILE Information
@ ============================
@ alert logs, mrp trace with system state dump, opatch lsinventory -detail
@
  Link to IPS Package:
  ====================
  not available

@ P1 Defect?
@ ==========
@ No
@
*** JBARLOW 11/21/12 02:14 am *** (CHG: Sta->10)
*** JBARLOW 11/21/12 02:14 am ***
@ No evidence on bugsftp. Please upload.
*** CGREY 11/21/12 03:05 pm ***
@ Bug Attachment:alert_PROVPRD1.log.zip:alert log primary
@ Bug Attachment:alert_PROVPRD2.log.zip:alert log primary
@ Bug Attachment:alert_PROVPRD3.log.zip:alert log primary
@ Bug Attachment:alert_PROVPRD1.log.zip:standby
@ Bug Attachment:alert_PROVPRD2.log.zip:standby
@ Bug Attachment:alert_PROVPRD3.log.zip:standby
@ Bug Attachment:alert_PROVPRR1.log.zip:primary
@ Bug Attachment:alert_PROVPRR2.log.zip:primary
@ Bug Attachment:alert_PROVPRR3.log.zip:primary
@ Bug Attachment:Broker_Conf.txt:broker
@ Bug Attachment:drcPROVPRD1.zip:primary
@ Bug Attachment:drcPROVPRR1.zip:standby
@ Bug Attachment:lsinventory_Primary.txt

patch
@ Bug Attachment:lsinventory_Standby.txt

patch
*** CGREY 11/21/12 03:11 pm *** (CHG: Sta->16)
*** JBARLOW 11/22/12 06:48 am *** (CHG: Sta->10)
*** JBARLOW 11/22/12 06:48 am ***
@ Where is the MRP trace and systemstate? Please upload these to bugftp.
*** CGREY 11/26/12 09:59 am ***
@ Bug Attachment

ROVPRR2_mrp0_10613.trc:state dump in mrp trace file
@ PROVPRR2_mrp0_10613.trc
*** CGREY 11/26/12 09:59 am ***
@ Sorry, attaching system state dump in mrp trace file PROVPRR2_mrp0_10613.trc
@ Thanks
*** CGREY 11/26/12 09:59 am *** (CHG: Sta->16)
@ *** JBARLOW 12/06/12 04:52 am *** (CHG: Support Contact->CGREY->JBARLOW)
*** JBARLOW 12/06/12 04:52 am *** (CHG: Asg->JBARLOW)
*** JBARLOW 12/07/12 03:41 am *** (CHG: Sta->10)
*** JBARLOW 12/07/12 03:41 am ***
@ alert_PROVPRR2.log :
@ .
@  Mon Oct 15 17:07:02 2012
@  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
@  Mon Oct 15 18:00:50 2012
@  System State dumped to trace file
@ /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_ora_23483.trc
@  Mon Oct 15 18:10:34 2012
@  System State dumped to trace file
@ /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc
@  Tue Oct 16 10:21:32 2012
@  ALTER SYSTEM SET log_archive_trace=8192 SCOPE=BOTH SID='*';
@  Tue Oct 16 10:24:17 2012
@  ALTER SYSTEM SET log_archive_trace=0 SCOPE=BOTH SID='*';
@  Tue Oct 16 10:58:25 2012
@  System State dumped to trace file
@ /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc
@  Tue Oct 16 12:10:43 2012
@  Errors in file
@ /app/oracle/diag/rdbms/provprr/PROVPRR2/trace/PROVPRR2_mrp0_10613.trc:
@  Tue Oct 16 12:10:44 2012
@  Dumping diagnostic data in directory=[cdmp_20121016121044], requested by
@ (instance=2, osid=10613 (MRP0)), summary=[abnormal process termination].
@  Tue Oct 16 13:33:59 2012
@  Standby controlfile consistent with primary
@  RFS[8]: Selected log 23 for thread 3 sequence 71 dbid 32324255 branch
@ 795712034
@  Tue Oct 16 13:34:12 2012
@  Archived Log entry 4808 added for thread 3 sequence 70 ID 0x302ac18 dest 2:
@  Tue Oct 16 15:04:02 2012
@  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
@  Tue Oct 16 15:05:19 2012
@  Managed Standby Recovery Canceled (PROVPRR2)
@  Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
@ .
@ From this we can see that 3 systemstates were dumped while the 'ALTER
@ DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL' was in progress.
@ These are in tracefiles PROVPRR2_ora_23483.trc and PROVPRR2_mrp0_10613.trc.
@ .
@ However, only PROVPRR2_mrp0_10613.trc has been uploaded to bugsftp. This file
@ is missing the first systemstate because the file was removed and recreated.
@ Furthermore, the single systemstate that we have was a level 2, so we don't
@ have the short stacks.
@ .
@ We do have the wait stacks in the systemstate. MRP (process 55) is in an idle
@ cycle of short waits for 'parallel recovery control message reply' and 'class
@ slave wait'. I would expect to see these waits in an idle MRP process. There
@ is no indication that this is the culprit.
@ .
@ Other sessions in a current long wait are VKTM (this is idle), process 112
@ and process 131.
@ .
@ Process 112:
@ .
@  OS pid 32273
@ .
@  Current Wait Stack:
@ 0: waiting for 'SQL*Net message from client'
@    driver id=0x54435000, #bytes=0x1, =0x0
@    wait_id=12197 seq_num=12247 snap_id=1
@    wait times: snap=1071 min 26 sec, exc=1071 min 26 sec, total=1071 min 26
@ sec
@    wait times: max=infinite, heur=1071 min 26 sec
@ .
@  This wait started on Oct 15 at 17:06:59. I think that this is a user session
@ and is in an idle wait.
@ .
@ Process 131:
@ .
@  OS pid 26919
@ .
@  This is RFS[3] (unless the OS process table has wrapped)
@ .
@  Current Wait Stack:
@ 0: waiting for 'SQL*Net message from client'
@    driver id=0x54435000, #bytes=0x1, =0x0
@    wait_id=247 seq_num=474 snap_id=1
@    wait times: snap=18295 min 41 sec, exc=18295 min 41 sec, total=18295 min
@ 41 sec
@    wait times: max=infinite, heur=18295 min 41 sec
@ .
@  This session has been in wait since the RFS process was created. This
@ indicates an idle RFS process.
@ .
@ There is not a great deal extra to be gleaned from the systemstate. All
@ recovery slaves are in idle wait states.
@ .
@ We really needed a series of level 266 systemstates (so we got the short
@ stacks) and/or level 3 hanganalyze dumps.
@ .
@ I would like to see what the other evidence dumped tells us so can you ask
@ the customer to upload the initial systemstate dump (PROVPRR2_ora_23483.trc)
@ and the contents of the diagnostic data dump directory cdmp_20121016121044.
@ .
@ Also has this problem occurred before or since?
*** CGREY 12/07/12 09:38 am ***
@ Asking customer for requested output.
@ Thanks
*** CGREY 12/12/12 01:08 pm ***
@ GTCRLK:3-6327164354

ROVPRR2_ora_23483.trc.zip:Attachment uploaded From
@ Internal Portal
@ GTCRLK:3-6327164354:cdmp_20121016121044.zip:Attachment uploaded From Internal
@ Portal
*** CGREY 12/12/12 01:09 pm *** (CHG: Sta->16)
*** JBARLOW 12/14/12 12:49 am *** (CHG: Sta->10)
*** JBARLOW 12/14/12 12:49 am ***
@ The systemstate in PROVPRR2_ora_23483.trc is essentially the same as that in
@ PROVPRR2_mrp0_10613.trc i.e. the process which issued the command to shutdown
@ standby recovery is waiting for MRP to terminate (waiting on "MRP
@ inactivation"), but MRP just appears to be in an idle state.
@ .
@ The systemstates show that although PR00 is not in a wait state, it's last
@ wait ended 5973 min 45 sec ago i.e 4 days 3 hrs 33 mins. The systemstate was
@ taken at 2012-10-15 18:00:50, so PR00's last wait ended around Oct 11 14:27.
@ .
@ The standby alert logs shows that this is around the time of the last ORL
@ recovery:
@ .
@  Thu Oct 11 14:11:49 2012
@  Media Recovery Waiting for thread 3 sequence 28 (in transit)
@  Recovery of Online Redo Log: Thread 3 Group 23 Seq 28 Reading mem 0
@ Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby11a.log
@  Recovery of Online Redo Log: Thread 1 Group 13 Seq 32 Reading mem 0
@ Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby01a.log
@  Thu Oct 11 14:34:34 2012
@  RFS[5]: Possible network disconnect with primary database
@
@ The next recovered log was after managed recovery was restarted:
@ .
@  Tue Oct 16 16:06:51 2012
@  Media Recovery Waiting for thread 1 sequence 42 (in transit)
@  Recovery of Online Redo Log: Thread 1 Group 13 Seq 42 Reading mem 0
@ Mem# 0: +DATA_RAID1_PROVPRD/provprr/standby01a.log
@ .
@ So it appears that PR00 got itself stuck around Oct 11 14:27.
@ .
@ We would have required multiple stack traces from PR00 to help understand why
@ the process is stuck.
@ .
@ Has this problem happened before and/or since this occurrence? The current
@ evidence is insufficient to make any further progress in RCA.
*** CGREY 12/14/12 08:50 am ***
@ I updated the customer asking if they can reproduce it or if it happens often
@ enough.  Can you give an action plan of what we need in case they can
@ reproduce it?
@ Thanks
*** CGREY 12/30/12 08:50 am ***
@ This is not reproducing and customer does not update.
@ Therefore putting in Auto Close Status.
@ Thanks
*** JBARLOW 02/05/13 05:50 am *** (CHG: Sta->31)
*** JBARLOW 02/05/13 05:50 am *** (ADD: Impact/Symptom->HANG - DATABASE INST...)
*** CGREY 02/05/13 07:50 am ***
@ SR is now closed
@ Thanks
*** JPINTO 05/06/13 10:21 pm *** (CHG: Sta->91)

		自动登录	找回密码
密码			立即注册

11.2 dataguard mrp进程不应用日志

本帖子中包含更多资源

相关帖子