Transparent Application Failover -TAF in Oracle 10g

Oracle 10g

It is not uncommon for businesses to demand 99.99% (or even 99.999%) availability for their enterprise applications. Think about what it would take to ensure a downtime of no more than .5 hours or even no downtime during the year. To answer many of these high-availability requirements, businesses are investing in mechanisms that provide for automatic failover when one participating system fails. When considering the availability of the Oracle database, Oracle RAC 10g provides a superior solution with its advanced failover mechanisms. Oracle RAC 10g includes the required components that all work within a clustered configuration responsible for providing continuous availability; when one of the participating systems fail within the cluster, the users are automatically migrated to the other available systems.

A major component of Oracle RAC 10g that is responsible for failover processing is the Transparent Application Failover (TAF) option. All database connections (and processes) that lose connections are reconnected to another node within the cluster. The failover is completely transparent to the user.

This final section provides a short demonstration on how TAF works in Oracle RAC 10g. Please note that a complete discussion of failover in Oracle RAC 10g would require an article in itself; my intention here is to present only a brief overview.

One important note is that TAF happens automatically within the OCI libraries. Thus your application (client) code does not need to change in order to take advantage of TAF. Certain configuration steps, however, will need to be done on the Oracle TNS file tnsnames.ora. (Keep in mind that as of this writing, the Java thin client will not be able to participate in TAF because it never reads tnsnames.ora.)

Setup the tnsnames.ora File

Before demonstrating TAF, we need to verify that a valid entry exists in the tnsnames.ora file on a non-RAC client machine (if you have a Windows machine lying around). Ensure that you have the Oracle RDBMS software installed. (Actually, you only need a client install of the Oracle software.)

During the creation of the clustered database in this guide, we created a new service that will be used for testing TAF named RACDB_TAF. It provides all the necessary configuration parameters for load balancing and failover. You can copy the contents of this entry to the %ORACLE_HOME%\network\admin\tnsnames.ora file on the client machine (my Windows laptop is being used in this example) in order to connect to the new Oracle clustered database:

Ö

RACDB_TAF =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = linux2-vip)(PORT = 1521))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = racdb_taf.idevelopment.info)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 180)
(DELAY = 5)
)
)
)
SQL Query to Check the Sessionís Failover Information

The following SQL query can be used to check a sessionís failover type, failover method, and if a failover has occurred. We will be using this query throughout this example.

COLUMN instance_name FORMAT a13
COLUMN host_name FORMAT a9
COLUMN failover_method FORMAT a15
COLUMN failed_over FORMAT a11

SELECT
instance_name
, host_name
, NULL AS failover_type
, NULL AS failover_method
, NULL AS failed_over
FROM v$instance
UNION
SELECT
NULL
, NULL
, failover_type
, failover_method
, failed_over
FROM v$session
WHERE username = ‘SYSTEM’;
TAF Demo
From a Windows machine (or other non-RAC client machine), login to the clustered database using the racdb_taf service as the SYSTEM user:
C:> sqlplus system/[email protected]_taf
COLUMN instance_name FORMAT a13
COLUMN host_name FORMAT a9
COLUMN failover_method FORMAT a15
COLUMN failed_over FORMAT a11

SELECT
instance_name
, host_name
, NULL AS failover_type
, NULL AS failover_method
, NULL AS failed_over
FROM v$instance
UNION
SELECT
NULL
, NULL
, failover_type
, failover_method
, failed_over
FROM v$session
WHERE username = ‘SYSTEM’;

INSTANCE_NAME HOST_NAME FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER


racdb1 linux1 SELECT BASIC NO
DO NOT log out of the above SQL*Plus session!

Now that we have run the query (above), we should now shutdown the instance racdb1 on linux1 using the abort option. To perform this operation, you can use the srvctl command-line utility as follows:

su – oracle

$ srvctl status database -d racdb
Instance racdb1 is running on node linux1
Instance racdb2 is running on node linux2

$ srvctl stop instance -d racdb -i racdb1 -o abort

$ srvctl status database -d racdb
Instance racdb1 is not running on node linux1
Instance racdb2 is running on node linux2
Now letís go back to our SQL session and rerun the SQL statement in the buffer:

COLUMN instance_name FORMAT a13
COLUMN host_name FORMAT a9
COLUMN failover_method FORMAT a15
COLUMN failed_over FORMAT a11

SELECT
instance_name
, host_name
, NULL AS failover_type
, NULL AS failover_method
, NULL AS failed_over
FROM v$instance
UNION
SELECT
NULL
, NULL
, failover_type
, failover_method
, failed_over
FROM v$session
WHERE username = ‘SYSTEM’;

INSTANCE_NAME HOST_NAME FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER

racdb2 linux2 SELECT BASIC YES
SQL> exit
From the above demonstration, we can see that the above session has now been failed over to instance racdb2 on linux2.