ORACLE SOS

 找回密码
 立即注册

QQ登录

只需一步,快速开始

搜索
查看: 5632|回复: 1

RAC已连接的会话Failover失败

[复制链接]

7

主题

13

帖子

71

积分

注册会员

Rank: 2

积分
71
发表于 2015-3-24 23:58:10 | 显示全部楼层 |阅读模式
本帖最后由 Johnliu 于 2015-3-25 00:00 编辑

请教各位一个问题,

我的1套2节点的RAC,11.2.0.3 on CentOS x86_64,
为了避免cache fusion导致不稳定,我新建了2个service,使特定业务连接到特定的首选实例。
1、新建2个service,指定首选实例和备选实例,TAF策略为preconnect
srvctl add service -d order1db -s order1dbsrv1 -r order1db1 -a order1db2 -P preconnect -e select -m basic -w 5 -z 2
srvctl add service -d order1db -s order1dbsrv2 -r order1db2 -a order1db1 -P preconnect -e select -m basic -w 5 -z 2


[grid@order1db02 ~]$ srvctl  config service -d order1db
Service name: order1dbsrv1
Service is enabled
Server pool: order1db_order1dbsrv1
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SELECT
Failover method: BASIC
TAF failover retries: 2
TAF failover delay: 5
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: PRECONNECT
Edition:
Preferred instances: order1db1
Available instances: order1db2

Service name: order1dbsrv2
Service is enabled
Server pool: order1db_order1dbsrv2
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SELECT
Failover method: BASIC
TAF failover retries: 2
TAF failover delay: 5
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: PRECONNECT
Edition:
Preferred instances: order1db2
Available instances: order1db1


2、客户端TNS配置如下:
其中65,66是2个SCAN VIP。
ord11=
  (DESCRIPTION=
    (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.65)(PORT=1521))
    (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.66)(PORT=1521))
    (LOAD_BALANCE=off)
    (FAILOVER=on)
    (CONNECT_DATA=(SERVICE_NAME=order1dbsrv1)
     (FAILOVER_MODE=(BACKUP=ord12)(TYPE=select)(METHOD=basic)(RETRIES=2)(DELAY=5))))


ord12=
  (DESCRIPTION=
    (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.66)(PORT=1521))
    (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.65)(PORT=1521))
    (LOAD_BALANCE=off)
    (FAILOVER=on)
    (CONNECT_DATA=(SERVICE_NAME=order1dbsrv2)
     (FAILOVER_MODE=(BACKUP=ord11)(TYPE=select)(METHOD=basic)(RETRIES=2)(DELAY=5))))


3、Failover测试
1) 断开心跳线,已连接的会话和新连接的会话都能正常failover到另外一个实例去
2) ifdown节点2的public网卡,发现VIP和SCAN IP都漂到节点1了,但是在“故障”发生前连接到node2的会话会一直hang住,15分钟左右报错如下;新建立的连接,30秒左右才能failover到节点1
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 2399
Session ID: 216 Serial number: 577



从节点2上面检查listener,因为public网卡被ifdown,所以网络故障不可达,报[size=14.2857151031494px]TNS-12543,
SQL> show parameter listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
listener_networks                    string
local_listener                       string      ORDER1DB2_LISTENER
remote_listener                      string      REMOTE_LISTENERS_SCAN
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
[oracle@order1db02 ~]$ tnsping ORDER1DB2_LISTENER

TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 24-MAR-2015 23:49:37

Copyright (c) 1997, 2011, Oracle.  All rights reserved.

Used parameter files:


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.66)(PORT = 1521)))
TNS-12543: TNS:destination host unreachable
[oracle@order1db02 ~]$ tnsping REMOTE_LISTENERS_SCAN

TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 24-MAR-2015 23:49:45

Copyright (c) 1997, 2011, Oracle.  All rights reserved.

Used parameter files:


Used TNSNAMES adapter to resolve the alias
Attempting to contact (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.65)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.66)(PORT = 1521)))
TNS-12543: TNS:destination host unreachable


You should make the the listeners aware of adjacent nodes load to do the server side load balance.
To make PMON to notify the load information to adjacent nodes,  you should set the REMOTE_LISTENER parameter."
请问,是不是因为网络断开,无法切换到[size=14.2857151031494px]remote_listene,导致server-side failover失败?


回复

使用道具 举报

95

主题

266

帖子

1719

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1719
发表于 2015-3-25 14:32:11 | 显示全部楼层
1. 你断开私有网络ip,crs重启或者主机重启,无论那样,数据库都会重启,因此上面的会话直接漂过去了
2. 你断开pubilc网络,已经连接的会话需要检查到网络超时后才会报错,特别如果session没有交互的情况下
3. 如果你要比较好的按照业务分割节点,可以考虑把REMOTE_LISTENER 设置为空值

Q Q:107644445
Tel:13429648788
Email:dba@xifenfei.com
个人Blog(惜分飞)
提供专业ORACLE技术支持(数据恢复,安装实施,升级迁移,备份容灾,故障诊断,系统优化等)
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|Archiver|手机版|ORACLE SOS 技术论坛

GMT+8, 2024-12-5 10:38 , Processed in 0.020071 second(s), 21 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表