Johnliu 发表于 2015-3-24 23:58:10

RAC已连接的会话Failover失败

本帖最后由 Johnliu 于 2015-3-25 00:00 编辑

请教各位一个问题,

我的1套2节点的RAC,11.2.0.3 on CentOS x86_64,
为了避免cache fusion导致不稳定,我新建了2个service,使特定业务连接到特定的首选实例。
1、新建2个service,指定首选实例和备选实例,TAF策略为preconnect
srvctl add service -d order1db -s order1dbsrv1 -r order1db1 -a order1db2 -P preconnect -e select -m basic -w 5 -z 2
srvctl add service -d order1db -s order1dbsrv2 -r order1db2 -a order1db1 -P preconnect -e select -m basic -w 5 -z 2


$ srvctlconfig service -d order1db
Service name: order1dbsrv1
Service is enabled
Server pool: order1db_order1dbsrv1
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SELECT
Failover method: BASIC
TAF failover retries: 2
TAF failover delay: 5
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: PRECONNECT
Edition:
Preferred instances: order1db1
Available instances: order1db2

Service name: order1dbsrv2
Service is enabled
Server pool: order1db_order1dbsrv2
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SELECT
Failover method: BASIC
TAF failover retries: 2
TAF failover delay: 5
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: PRECONNECT
Edition:
Preferred instances: order1db2
Available instances: order1db1


2、客户端TNS配置如下:
其中65,66是2个SCAN VIP。
ord11=
(DESCRIPTION=
    (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.65)(PORT=1521))
    (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.66)(PORT=1521))
    (LOAD_BALANCE=off)
    (FAILOVER=on)
    (CONNECT_DATA=(SERVICE_NAME=order1dbsrv1)
   (FAILOVER_MODE=(BACKUP=ord12)(TYPE=select)(METHOD=basic)(RETRIES=2)(DELAY=5))))


ord12=
(DESCRIPTION=
    (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.66)(PORT=1521))
    (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.65)(PORT=1521))
    (LOAD_BALANCE=off)
    (FAILOVER=on)
    (CONNECT_DATA=(SERVICE_NAME=order1dbsrv2)
   (FAILOVER_MODE=(BACKUP=ord11)(TYPE=select)(METHOD=basic)(RETRIES=2)(DELAY=5))))


3、Failover测试
1) 断开心跳线,已连接的会话和新连接的会话都能正常failover到另外一个实例去
2) ifdown节点2的public网卡,发现VIP和SCAN IP都漂到节点1了,但是在“故障”发生前连接到node2的会话会一直hang住,15分钟左右报错如下;新建立的连接,30秒左右才能failover到节点1
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 2399
Session ID: 216 Serial number: 577



从节点2上面检查listener,因为public网卡被ifdown,所以网络故障不可达,报TNS-12543,
SQL> show parameter listener

NAME                                 TYPE      VALUE
------------------------------------ ----------- ------------------------------
listener_networks                  string
local_listener                     string      ORDER1DB2_LISTENER
remote_listener                      string      REMOTE_LISTENERS_SCAN
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
$ tnsping ORDER1DB2_LISTENER

TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 24-MAR-2015 23:49:37

Copyright (c) 1997, 2011, Oracle.All rights reserved.

Used parameter files:


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.66)(PORT = 1521)))
TNS-12543: TNS:destination host unreachable
$ tnsping REMOTE_LISTENERS_SCAN

TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 24-MAR-2015 23:49:45

Copyright (c) 1997, 2011, Oracle.All rights reserved.

Used parameter files:


Used TNSNAMES adapter to resolve the alias
Attempting to contact (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.65)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.66)(PORT = 1521)))
TNS-12543: TNS:destination host unreachable


”You should make the the listeners aware of adjacent nodes load to do the server side load balance.
To make PMON to notify the load information to adjacent nodes,you should set the REMOTE_LISTENER parameter."
请问,是不是因为网络断开,无法切换到remote_listene,导致server-side failover失败?155313697


xifenfei 发表于 2015-3-25 14:32:11

1. 你断开私有网络ip,crs重启或者主机重启,无论那样,数据库都会重启,因此上面的会话直接漂过去了
2. 你断开pubilc网络,已经连接的会话需要检查到网络超时后才会报错,特别如果session没有交互的情况下
3. 如果你要比较好的按照业务分割节点,可以考虑把REMOTE_LISTENER 设置为空值
页: [1]
查看完整版本: RAC已连接的会话Failover失败