朝气蓬勃的黑框眼镜 · 他们直播带货,想带遇害六年的女儿回家_新闻频 ...· 1 年前 · |
刚毅的眼镜 · 三个旧金山湾区最热门城市推荐,附最新房价 - 知乎· 1 年前 · |
胡子拉碴的椰子 · 一次出格的社会实验:DAU背后的伦理之争 ...· 1 年前 · |
冷静的萝卜 · 2023上海车展:魏牌高山正式亮相-新浪汽车· 1 年前 · |
孤独求败 · java - JSONObject - ...· 4 年前 · |
We suggest you try the following to help find what you’re looking for:
This document will guide you through configuring Oracle Data Guard Fast-Start Failover (FSFO) using a physical standby database. FSFO can provide substantial gains in high availability and disaster recovery preparedness for all environments, from inexpensive Cloud-based systems to global distributed data centers.
The information in this guide is based on practical experience gained from deploying FSFO in a global corporate production environment. The guide makes few assumptions about your existing environment and includes examples for creating a physical standby database and Data Guard Broker configuration. To get started, all you'll need is Oracle Database Enterprise Edition Release 10.2 or later, a database, and three hosts: two for the databases and a small host for the FSFO observer. The guide attempts to be operating system agnostic; however, some examples may contain platform specific elements such as path and file naming conventions.
FSFO builds upon a number of other Oracle technologies and features such as Data Guard, Flashback Database , and Data Guard Broker.
The foundation of FSFO is Data Guard - a primary and at least one standby. The standby can be physical or logical and there can be multiple standbys, but only one of the standbys can be the failover target at any given time. The following paragraphs describe the supported availability modes.
In Maximum Availability mode, FSFO guarantees that no transaction that has received a commit acknowledgment will be lost during a failover. The price for this guarantee is increased commit latency ( log file sync waits). Maximum Availability mode uses synchronous redo transfer and FSFO imposes the additional requirement that the redo is recorded in the standby redo log (SRL) of the target standby (AFFIRM option of log_archive_dest_ n ). Overall commit latency is increased by the round-trip network latency. With increased latency comes decreased throughput; however, in some cases the difference in throughput may be made up by increasing parallelism.
Although redo transfer is synchronous, Maximum Availability mode allows the primary to remain available if the standby database becomes unavailable for any reason (e.g. standby database, host, or network failure, etc.). If the primary is unable to contact the standby after a user specified period of time (NET_TIMEOUT option of log_archive_dest_ n), it drops out of synchronous transfer mode and begins operating as though it were in Maximum Performance mode. When the standby becomes available again, the primary and standby re-synchronize and resume synchronous redo transfer.
Oracle Database 11 g FSFO adds support for Maximum Performance mode (async redo transfer), providing the flexibility to trade durability for performance. Commit latency is not affected by redo transfer, but committed transactions whose redo has not been received by the standby will be lost during failover. FSFO configurations in Maximum Performance mode may limit potential data loss by specifying the maximum allowable age of transactions that are lost during a failover. For example, if the limit specified is 30 seconds (the default), FSFO guarantees that all transactions that committed prior to 30 seconds ago are preserved during failover. The minimum allowable limit is 10 seconds.
Broker is a Data Guard management utility that maintains state information about a primary and its standby databases. It automatically sets Data Guard related database initialization parameters on instance start and role transitions, starts apply services for standbys, and automates many of the administrative tasks associated with maintaining a Data Guard configuration. FSFO is a feature of Broker which records information about the failover target, how long to wait after a failure before triggering a failover, and other FSFO specific properties.
Flashback Database is a continuous data protection (CDP) solution integrated with the Oracle Database. It provides a way to quickly restore a database to a previous point in time or SCN using on-disk data structures called flashback logs. Flashing back a database is much faster and more seamless (one simple DDL statement) than traditional point-in-time or SCN-based recovery. FSFO uses Flashback Database as part of the process of reinstating a failed primary as a standby.
Problems with automatic reinstatement are frequently due to misconfiguration, so let's look at this in a bit more detail.
Flashback Database records the before-image of changed blocks. To avoid the overhead of recording every change to every block, Flashback Database takes a "fuzzy" snapshot every 30 minutes and only records the before-image block upon its first change since the last snapshot. Subsequent changes to the same block during the same snapshot are not recorded.
Flashing back a database occurs in two stages:
For FSFO environments, set db_flashback_retention_target = 60 or higher to provide sufficient Flashback Database history for automatic standby reinstatement. Metadata for the fuzzy snapshot is stored in the flashback log itself. If that metadata is pushed out, Oracle can no longer find a fuzzy snapshot so it will not be able to flash back. To avoid problems due to timing variations, values less than 60 minutes are not recommended and values of 30 or less virtually guarantee Flashback Database failure.
Flashback Database stores its logs in the Flash Recovery Area (FRA), so the FRA must be large enough to store at least 60 minutes of Flashback Database history. The total storage requirement is proportional to the number of distinct blocks changed during snapshots - e.g. 1,000,000 block changes on a small set of blocks generates less Flashback Database history than 1,000,000 changes on a larger set of blocks. A good method to determine Flashback Database storage requirements is to enable Flashback Database and observe the amount of storage it uses during several peak loads. There is little risk in enabling Flashback Database to determine its storage requirements - it can be disabled while the primary is open if necessary. However, re-enabling Flashback Database will require a bounce since the database must be mounted and not open.
The observer is the third party in an otherwise typical primary/standby Data Guard configuration. It is actually a low-footprint OCI client built into the DGMGRL CLI (Data Guard Broker Command Line Interface) and, like any other client, may be run on a different hardware platform than the database servers. Its primary job is to perform a failover when conditions permit it to do so without violating the data durability constraints set by the DBA. Only the observer can initiate FSFO failover. It's secondary job is to automatically reinstate a failed primary as a standby if that feature is enabled (the default). The observer is the key element that separates Data Guard failover from its pre-FSFO role as the plan of last resort to its leading role in a robust high availability solution.
Note: the FSFO observer version must match the database version. Oracle Database 11 g observers are incompatible with 10 g databases and vice-versa.
By default, the observer will initiate failover to the target standby if and only if ALL of the following are true:
Oracle Database 11 g Rel 1 introduced user configurable failover conditions that can trigger the observer to initiate failover immediately.
Broker can be configured to initiate failover on any of the following conditions. Conditions shown in blue are enabled by default.
You can also specify a list of ORA- errors that will initiate FSFO failover. The list is empty by default.
Applications can initiate FSFO failover directly using the DBMS_DG.INITIATE_FS_FAILOVER procedure with an optional message text that will be displayed in the observer log and the primary's alert log.
The walkthrough begins with a single database that will become the primary of a Data Guard configuration. For this build, we will use a single physical standby database. FSFO can also be used with logical standbys and an FSFO-enabled configuration may have multiple standbys with a mix of physical and logical, but only one standby can be the failover target at any given time.
The major steps in the walkthrough are:
Database hosts are referred to as "a" and "b" hosts and the databases themselves are referred to as the "a" and "b" databases. The observer host is 'observer.demo.org'.
Names used in the examples:
Input commands are shown in shaded boxes in normal text. Expected output is shown in blue text.
Data Guard uses Oracle Net (SQL*Net) for communication between the primary and standby databases and the FSFO observer. Getting the Oracle Net configuration right is one of the key factors in a successful FSFO deployment. Improper Oracle Net configuration is a leading cause of reported FSFO issues.
Note: Data Guard requires dedicated server connections for proper operation. Do not use Shared Server (formerly MTS) for Data Guard.
It's good practice to use separate listeners for application connections and Data Guard connections. This allows Data Guard to remain functional during maintenance periods when the application listeners are down. Be sure to include the Data Guard listener in the local_listeners database parameter.
Most of the network services used in a FSFO environment may use dynamic registration, but to enable Broker to restart instances during role transitions or during reinstatement after a failover, you must define a static service named db_unique_name _dgmgrl. db_domain . (Note: 11.1.0.7 adds the StaticConnectIdentifier Broker database property to allow you to specify a different service name.) If you will be using RMAN to create the standby database, it also needs a static service to restart the database being created. In order to maintain separation of Broker and non-Broker activity, a second static service is recommended.
listener.ora configuration for host "a":
LISTENER_DG = (DESCRIPTION= (ADDRESS_LIST= (ADDRESS=(PROTOCOL=tcp)(HOST=dbhost-a)(PORT=1522)) ) ) SID_LIST_LISTENER_DG= (SID_LIST= (SID_DESC= (SID_NAME=db1) (SDU=32767) (ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1) (GLOBAL_DBNAME=db1_a_static.demo.org) ) (SID_DESC= (SID_NAME=db1) (SDU=32767) (ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1) (GLOBAL_DBNAME=db1_a_dgmgrl.demo.org) ) )
listener.ora configuration for host "b":
LISTENER_DG = (DESCRIPTION= (ADDRESS_LIST= (ADDRESS=(PROTOCOL=tcp)(HOST=dbhost-b)(PORT=1522)) ) ) SID_LIST_LISTENER_DG= (SID_LIST= (SID_DESC= (SID_NAME=db1) (SDU=32767) (ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1) (GLOBAL_DBNAME=db1_b_static.demo.org) ) (SID_DESC= (SID_NAME=db1) (SDU=32767) (ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1) (GLOBAL_DBNAME=db1_b_dgmgrl.demo.org) ) )
Create a unique connect alias for each database. Using the db_unique_name of each database as the Oracle Net alias is simple and intuitive.
db1_a: Alias to connect to the dynamic Data Guard service on database "a"
db1_b: Alias to connect to the dynamic Data Guard service on database "b"
db1_a_static: Alias to connect to the static Data Guard service on database "a"
db1_b_static: Alias to connect to the static Data Guard service on database "b"
Example tnsnames.ora entries:
db1_a= (description= (SDU=32767) (address_list= (address=(protocol=tcp)(host=dbhost-a)(port=1522)) ) (connect_data= (service_name=db1_a.demo.org) (server=dedicated) ) ) db1_b= (description= (SDU=32767) (address_list= (address=(protocol=tcp)(host=dbhost-b)(port=1522)) ) (connect_data= (service_name=db1_b.demo.org) (server=dedicated) ) ) db1_a_static= (description= (SDU=32767) (address_list= (address=(protocol=tcp)(host=dbhost-a)(port=1522)) ) (connect_data= (service_name=db1_a_static.demo.org) (server=dedicated) ) ) db1_b_static= (description= (SDU=32767) (address_list= (address=(protocol=tcp)(host=dbhost-b)(port=1522)) ) (connect_data= (service_name=db1_b_static.demo.org) (server=dedicated) ) )
Start the Data Guard listener on both "a" and "b" hosts.
lsnrctl start LISTENER_DG
Verify the configuration from both hosts.
tnsping db1_a Attempting to contact (description= (SDU=32767) (address_list= (address=(protocol=tcp)(host=dbhost-a)(port=1522))) (connect_data= (service_name=db1_a.demo.org) (server=dedicated))) OK (0 msec)
tnsping db1_b Attempting to contact (description= (SDU=32767) (address_list= (address=(protocol=tcp)(host=dbhost-b)(port=1522))) (connect_data= (service_name=db1_b.demo.org) (server=dedicated))) OK (0 msec)
A number of prerequisites must be met on the primary in order to use Fast-Start Failover. This section describes how to configure and verify each prerequisite. To see if your primary has already met a prerequisite, follow the instructions in the Verify section.
Steps that require the primary to be in a mounted (not open) state are grouped together in the section below entitled Steps Requiring a Bounce of the Primary .
To allow the database to register with the Data Guard listener, the listener endpoint must be added to the database's local_listener parameter. If local_listener is already in use, add the Data Guard listener to the list. After setting local_listener, register the database with the listener and verify the services have been registered.
Note: You can also use TNS aliases defined in the tnsnames.ora file when setting the local_listener parameter. This is particularly useful when registering with multiple listeners where the parameter value would otherwise exceed the 255 character limit.
alter system set local_listener='(address=(host=dbhost-a)(port=1522)(protocol=tcp))'; alter system register;
show parameter local_listener NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ local_listener string (address=(host=dbhost-a)(port= 1522)(protocol=tcp))
lsnrctl status listener_dg ... Services Summary... Service "DB1_A" has 1 instance(s). Instance "db1", status READY, has 1 handler(s) for this service... Service "DB1_A_XPT" has 1 instance(s). Instance "db1", status READY, has 1 handler(s) for this service... Service "db1XDB" has 1 instance(s). Instance "db1", status READY, has 1 handler(s) for this service... Service "db1_a_dgmgrl.demo.org" has 1 instance(s). Instance "db1", status UNKNOWN, has 1 handler(s) for this service... Service "db1_a_static.demo.org" has 1 instance(s). Instance "db1", status UNKNOWN, has 1 handler(s) for this service... The command completed successfully
All Data Guard environments should enable force logging at the database level in order to guard against nologging tablespaces from being added.
alter database force logging;
select force_logging from v$database; FOR --- YES
Broker changes database parameters during startup and role transitions via ALTER SYSTEM commands. An spfile is required to persist these changes.
create spfile='?/dbs/spfile${ORACLE_SID}.ora' from pfile='?/dbs/init${ORACLE_SID}.ora'; alter system set spfile='?/dbs/spfiledb1.ora';
show parameter spfile;
NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ spfile string ?/dbs/spfiledb1.ora
Create a password file
All Data Guard environments require the use of a password file in order to allow the databases to connect to each other.
orapwd file=$ORACLE_HOME/dbs/orapw$ORACLE_SID
select * from v$pwfile_users; USERNAME SYSDB SYSOP SYSAS ------------------------------ ----- ----- ----- SYS TRUE TRUE FALSE
Remote login is required, along with a password file, to allow the databases in a Data Guard configuration to connect to each other.
alter system set remote_login_passwordfile=exclusive scope=spfile;
show parameter remote_login_passwordfile NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ remote_login_passwordfile string EXCLUSIVE
Each database in a Data Guard configuration must have a unique name. This guide uses the naming convention of appending an underscore followed by a letter to the db_name to create the db_unique_name.
alter system set db_unique_name = db1_a scope=spfile; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_unique_name string db1_a
show parameter db_unique_name NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_unique_name string db1_a
If you don't already have a Flash Recovery Area (FRA), you will need to create one for Flashback Database. If you already have an FRA, you may need to increase its size in order to accommodate the Flashback Database files. See the Flashback Database section above for information on storage requirements.
alter system set db_recovery_file_dest_size = 20g scope=both; alter system set db_recovery_file_dest = '/u01/fra' scope=both;
show parameter db_recovery_file NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_recovery_file_dest string /u02/flash_recovery_area db_recovery_file_dest_size big integer 2G
Not a hard requirement, but recommended.
Enable
alter system set standby_file_management=auto;
Verify
show parameter standby_file_management NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ standby_file_management string AUTO
This parameter must be set before the primary can be opened in Maximum Availability mode. The remaining Data Guard-related parameters will be set by Broker later in the walkthrough.
alter system set log_archive_config='DG_CONFIG=(db1_b)' scope=both;
show parameter log_archive_config NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_config string DG_CONFIG=(db1_b)
In order to accommodate all load conditions, Oracle recommends having at least one more SRL group than the number of ORL groups of the same size. This walkthrough assumes that all ORLs and SRLs on the primary and standby databases are the same size.
Determine the number and size of the Online Redo Log files (ORLs)
select bytes, count(group#) from v$log group by bytes; BYTES COUNT(GROUP#) ---------- ------------- 52428800 3
Find the highest group#
select max(group#) from v$log; MAX(GROUP#) ----------- 3
Add the SRLs. Unlike ORLs, SRLs should be created with only one member per group. There is no need to multiplex SRLs in order to protect redo as with ORLs (the redo is already protected in the ORLs of the primary). Multiplexing SRLs merely adds unnecessary IO and can increase commit latency. For systems with multiple RAID controllers, consider creating SRLs such that their IO is balanced across the controllers.
In this example, there are 3 ORLs with a max group# of 3. We will create 4 SRLs starting with group# 11. Starting with 11 is purely cosmetic - it allows new ORL groups to be added later while keeping their group# in the same sequence as the existing ORLs.
alter database add standby logfile group 11 '/u02/oradata/db1/stby-t01-g11-m1.log' size 52428800; alter database add standby logfile group 12 '/u02/oradata/db1/stby-t01-g12-m1.log' size 52428800; alter database add standby logfile group 13 '/u02/oradata/db1/stby-t01-g13-m1.log' size 52428800; alter database add standby logfile group 14 '/u02/oradata/db1/stby-t01-g14-m1.log' size 52428800;
select group#, type, member from v$logfile where type = 'STANDBY'; GROUP# TYPE MEMBER ---------- ------- ---------------------------------------- 11 STANDBY /u02/oradata/db1/stby-t01-g11-m1.log 12 STANDBY /u02/oradata/db1/stby-t01-g12-m1.log 13 STANDBY /u02/oradata/db1/stby-t01-g13-m1.log 14 STANDBY /u02/oradata/db1/stby-t01-g14-m1.log
The following steps all require the database to be in a mounted (not open) state. They can all be done at the same time in a single bounce.
alter database archivelog;
select log_mode from v$database; LOG_MODE ------------ ARCHIVELOG
alter database flashback on; alter system set db_flashback_retention_target = 60 scope=both;
select flashback_on from v$database; FLASHBACK_ON ------------------ YES show parameter db_flashback_retention_target NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_flashback_retention_target integer 60
As mentioned above, Maximum Availability mode is mandatory for Oracle Database 10 g and optional for Oracle Database 11 g .
alter database set standby database to maximize availability;
select protection_mode from v$database; PROTECTION_MODE -------------------- MAXIMUM AVAILABILITY
If you don't already have a standby database, use your favorite method to create one. The example below takes advantage of the 11 g RMAN Active Database Duplication feature. This feature enables RMAN to duplicate an existing database over the network without requiring a backup to disk or tape. The following assumes that the standby host has been setup according to Oracle's recommendations and that the operating system, accounts, security, resource limits, directory structure, etc. are configured correctly.
In Oracle Database 11 g , the password file on the standby must be a physical copy of the password file on the primary due to security enhancements introduced in Oracle Database 11 g . Oracle Database 10 g allows a different password file to be used as long as the SYS passwords are the same on the primary and standby.
Add an entry to the oratab file for the standby
db1:/u01/app/oracle/product/11.1.0/db_1:Y
For the RMAN duplicate active database method, the init.ora file (initdb1.ora in the example) requires only one parameter: db_name (it doesn't even have to be the real name of the database - any name will do). RMAN will copy the spfile from the primary, so this init.ora file is only needed during the first phase of the duplication.
db_name = db1
Make sure that your OS environment on the standby is setup. Use the oraenv script provided by Oracle.
startup nomount
Run the RMAN utility and connect to the target (primary) and auxiliary (new standby).
rman target sys/password@db1_a_static auxiliary sys/password@db1_b_static connected to target database: DB1 (DBID=1234567890) connected to auxiliary database: X (not mounted)
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
The procedure for using RMAN to create a standby database is fully explained in Appendix F of Oracle Oracle Data Guard Concepts and Administration document (10 g Rel 2 and 11 g Rel 1). The example uses the FROM ACTIVE DATABASE clause introduced in 11g that allows RMAN to create a standby database by copying the primary across the network without the need to store the backup files on disk or tape. RMAN also copies the spfile and password files and you can change the values for individual parameters. At a minimum, you must set db_unique_name. The example assumes that the standby uses the same directory structure as the primary.
Note: If you have just enabled archivelog mode, force an archive log creation ( alter system archive log current) to ensure that at least one archive log exists. Otherwise, the DUPLICATE TARGET DATABASE command will fail with "RMAN-20208: UNTIL CHANGE is before RESETLOGS change".
DUPLICATE TARGET DATABASE FOR STANDBY FROM ACTIVE DATABASE DORECOVER SPFILE SET db_unique_name='db1_b' SET log_archive_config='' SET log_file_name_convert= ' ',' ' SET local_listener='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbhost-b)(PORT=1522)))' NOFILENAMECHECK;
alter database flashback on; alter system set db_flashback_retention_target = 60 scope=both;
select flashback_on from v$database; FLASHBACK_ON ------------------ YES show parameter db_flashback_retention_target NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_flashback_retention_target integer 60
Broker stores it configuration information in a mirrored set of files outside the database. By default, both files are stored in $ORACLE_HOME/dbs. To protect the files, it's good practice to store them in separate filesystems.
alter system set dg_broker_config_file1='/u01/app/oracle/admin/db1/dgbroker/dg1db1.dat'; alter system set dg_broker_config_file2='/u02/app/oracle/admin/db1/dgbroker/dg2db1.dat';
show parameter dg_broker_config_file NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ dg_broker_config_file1 string /u01/app/oracle/admin/db1/dgbr oker/dg1db1.dat dg_broker_config_file2 string /u02/app/oracle/admin/db1/dgbr oker/dg2db1.dat
alter system set dg_broker_start=true;
show parameter dg_broker_start NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ dg_broker_start boolean TRUE
dgmgrl sys/password@db1_a
create configuration 'FSF' as primary database is db1_a connect identifier is db1_a; Configuration "FSF" created with primary database "db1_a"
add database db1_b as connect identifier is db1_b maintained as physical; Database "db1_b" added
show configuration Configuration Name: FSF Enabled: NO Protection Mode: MaxAvailability Databases: db1_a - Primary database db1_b - Physical standby database Fast-Start Failover: DISABLED Current status for "FSF": DISABLED
Broker will set the primary to use asynchronous log transport by default. For Maximum Availability environments, change this to synchronous.
The NetTimeout property specifies the number of seconds LGWR will block waiting for acknowledgment from the standby in synchronous mode before considering the connection lost (corresponds to the NET_TIMEOUT option of log_archive_dest_n). The default value is 30 seconds. When using Maximum Availability mode, consider lowering this to reduce the time commits block when the standby becomes unavailable. Choose a value high enough to avoid false disconnects from intermittent network trouble. The example uses 10 seconds.
Oracle Database 11 g adds the ObserverConnectIdentifier database property to the Broker configuration, allowing you to specify a connect identifier for the observer to use for monitoring the primary and failover target. By default, the observer uses the same connect identifiers used by Data Guard for redo transfer and information exchange between the primary and standby ( DGConnectIdentifier in Oracle Database 11 g , InitialConnectIdentifier in Oracle Database 10 g ). ObserverConnectIdentifier allows you to specify different connect identifiers for the observer to use. You might, for instance, use this to allow the observer to monitor the databases using the same connect identifiers as the client applications.
We'll leave the other properties at their default values for the walkthrough, but you should become familiar with all of the Broker config and database properties. Chapter 9 of the Data Guard Broker documentation (10 g and 11 g ) contains a description of each property. Some properties have changed between those releases.
Note: Many of the Broker database properties correspond to database spfile parameters. Broker maintains these parameters by issuing ALTER SYSTEM commands as appropriate during role transitions, database startup/shutdown, and other events. If these parameters are modified outside of Broker, it raises a warning. To see the specific parameter, use the "show database ... StatusReport" command.
edit database db1_a set property LogXptMode='SYNC'; edit database db1_a set property NetTimeout=10; edit database db1_b set property NetTimeout=10;
Broker will validate the configuration, set parameters on both databases, and start managed recovery. This may take a few minutes. Tailing the alert logs on the primary and standby is a good way to watch Broker in action and get familiar with how it performs various tasks.
enable configuration;
Make sure everything is working before moving on.
show configuration Configuration Name: FSF Enabled: YES Protection Mode: MaxAvailability Databases: db1_a - Primary database db1_b - Physical standby database Fast-Start Failover: DISABLED Current status for "FSF": SUCCESS
This is a good time to enable FSFO to make sure that all of the prerequisites have been met. Broker will verify that the configuration meets all prerequisites before enabling FSFO and will report any problems it finds. The most common problems are mismatched Data Guard protection modes and LogXptMode properties and forgetting to enable Flashback Database on the primary or standby.
Note that enabling FSFO does not make the configuration ready for automatic failover - that requires an observer, which we'll get to next.
enable fast_start failover; Enabled.
With FSFO enabled, Broker expects to find an observer, which we haven't started yet, so if you verify the at this point with 'show configuration', Broker will report a warning (if it doesn't, give it a minute to discover that the observer isn't there).
show configuration Configuration Name: FSF Enabled: YES Protection Mode: MaxAvailability Databases: db1_a - Primary database db1_b - Physical standby database - Fast-Start Failover target Fast-Start Failover: ENABLED Current status for "FSF": Warning: ORA-16608: one or more databases have warnings
Running a StatusReport on the primary should verify that the error is due to a missing observer.
show database db1_a statusreport STATUS REPORT INSTANCE_NAME SEVERITY ERROR_TEXT * ERROR ORA-16819: fast-start failover observer not started
To maximize the benefits of FSFO, the observer should run on a different host than the primary and standby databases. Ideally the primary, standby, and observer will be in geographically separate areas. The observer is very lightweight, requiring few system resources. Unlike the primary / standby interconnect, where bandwidth and latency are determining performance factors, the observer requires very little network bandwidth and is not overly latency sensitive, allowing the it to be placed practically anywhere a reliable connection is available.
Since the observer is a specialized instance of a dgmgrl session, the observer host should be installed with either the Oracle Client Administrator software or the full Oracle Database software stack.
tnsping db1_a Attempting to contact (description= (SDU=32767) (address_list= (address=(protocol=tcp)(host=dbhost-a)(port=1522))) (connect_data= (service_name=db1_a.demo.org) (server=dedicated))) OK (0 msec) tnsping db1_b Attempting to contact (description= (SDU=32767) (address_list= (address=(protocol=tcp)(host=dbhost-b)(port=1522))) (connect_data= (service_name=db1_b.demo.org) (server=dedicated))) OK (0 msec)
Start the observer by running dgmgrl and logging in using SYS credentials. We'll start it interactively for now to verify that everything's working. Notice that the terminal session appears to hang after starting the observer. This is normal. The Appendix provides information on creating a simple wrapper script to start the observer as a background process.
dgmgrl sys/password@db1_a start observer; observer started
The terminal session will appear to hang at this point.
In a separate terminal session, verify the configuration. This example shows the verbose mode of the 'show configuration' command that provides FSFO-specific information. If the status is SUCCESS, you're ready to start testing role transitions.
dgmgrl sys/password@db1_a show configuration verbose Configuration Name: FSF Enabled: YES Protection Mode: MaxAvailability Databases: db1_a - Primary database db1_b - Physical standby database - Fast-Start Failover target Fast-Start Failover: ENABLED Threshold: 30 seconds Target: db1_b Observer: observer.demo.org Lag Limit: 30 seconds (not in use) Shutdown Primary: TRUE Auto-reinstate: TRUE Current status for "FSF": SUCCESS
Use the 'show fast_start failover' command to see which user configurable FSFO failover conditions are in effect.
show fast_start failover; Fast-Start Failover: ENABLED Threshold: 30 seconds Target: db1_b Observer: observer.demo.org Lag Limit: 30 seconds (not in use) Shutdown Primary: TRUE Auto-reinstate: TRUE Configurable Failover Conditions Health Conditions: Corrupted Controlfile YES Corrupted Dictionary YES Inaccessible Logfile NO Stuck Archiver NO Datafile Offline YES Oracle Error Conditions: (none)
The real test of the configuration is a successful role transition in both directions with both switchover and FSFO failover. We'll start with switchovers.
In order to fully automate switchover, Broker needs SYSDBA credentials in order to restart one or both databases. Without the credentials, Broker will complete the role transition, but will leave the databases in need of a manual restart.
dgmgrl sys/password@db1_a switchover to db1_b Performing switchover NOW, please wait... New primary database "db1_b" is opening... Operation requires shutdown of instance "db1" on database "db1_a" Shutting down instance "db1"... ORA-01109: database not open Database dismounted. ORACLE instance shut down. Operation requires startup of instance "db1" on database "db1_a" Starting instance "db1"... ORACLE instance started. Database mounted. Switchover succeeded, new primary is "db1_b"
show configuration Configuration Name: FSF Enabled: YES Protection Mode: MaxAvailability Databases: db1_b - Primary database db1_a - Physical standby database - Fast-Start Failover target Fast-Start Failover: ENABLED Current status for "FSF": SUCCESS
Now let's test switchover in the other direction.
switchover to db1_a Performing switchover NOW, please wait... New primary database "db1_a" is opening... Operation requires shutdown of instance "db1" on database "db1_b" Shutting down instance "db1"... ORA-01109: database not open Database dismounted. ORACLE instance shut down. Operation requires startup of instance "db1" on database "db1_b" Starting instance "db1"... ORACLE instance started. Database mounted. Switchover succeeded, new primary is "db1_a"
show configuration Configuration Name: FSF Enabled: YES Protection Mode: MaxAvailability Databases: db1_a - Primary database db1_b - Physical standby database - Fast-Start Failover target Fast-Start Failover: ENABLED Current status for "FSF": SUCCESS
Now that we know switchovers work, it's time to test failovers. Testing FSFO failover requires simulating loss of the primary. The simplest way to do this is to abort the primary. This will signal the observer to initiate failover after the FSFO threshold timeout has been reached (default is 30 seconds). It is instructive to watch the alert logs on both databases as well as the observer log after aborting the primary to gain insight into what happens during FSFO failover.
It is also possible to initiate failover using the dgmgrl failover command. This exercises the configuration, but triggers failover differently than losing contact with the primary.
We want the observer to be able to automatically reinstate the former primary as a standby after our failover tests, so before each test, make sure that Flashback Database has at least 30 minutes of history. Do this prior to every failover test. If Flashback Database history is insufficient, the observer will not be able to reinstate and you will have to manually reinstate from backup or by primary duplication.
On the primary to be aborted:
select (sysdate - oldest_flashback_time)*24*60 as history from v$flashback_database_log; HISTORY ---------- 140.35
Don't initiate failover unless at least 30 minutes of history is available.
Create some test data
It wouldn't be much of a test if we didn't verify that our durability constraints were being met, so let's make a change on the primary and see if it survives the failover. This walkthrough uses Maximum Availability mode to achieve "zero data loss".
Log in as a test user and make some changes that won't impact other parts of the system.
-- Note that DDL statements automatically commit create table x as select * from all_objects; Table created. select count(*) from x; COUNT(*) ---------- 68855
Abort the primary.
shutdown abort
observer log:
Initiating Fast-Start Failover to database "db1_b"... Performing failover NOW, please wait... Failover succeeded, new primary is "db1_b"
Examine the Broker configuration by logging into dgmgrl on the new primary. Notice that the former primary is now disabled.
dgmgrl sys/password@db1_b show configuration Configuration Name: FSF Enabled: YES Protection Mode: MaxAvailability Databases: db1_b - Primary database db1_a - Physical standby database (disabled) - Fast-Start Failover target Fast-Start Failover: ENABLED Current status for "FSF": Warning: ORA-16608: one or more databases have warnings
Log into the new primary and verify that the changes made it across.
select count(*) from x; COUNT(*) ---------- 68855
Initiate reinstatement by mounting the database. Note that the database will not open at this point. A database in the primary role will not open until it has verified with the observer that it is still the primary. If the observer finds that the database is no longer the primary, it will attempt to reinstate it as the failover target standby.
The first step in reinstatement is to flash the database back to the SCN where the standby became the primary (v$database.standby_became_primary_scn on the new primary). As described in the Flashback Database section, Flashback Database takes place in two stages: a restore stage and a media recovery stage. In the restore stage, Flashback Database restores the database to a point prior to the standby_became_primary_scn using the before-image blocks in the Flashback Database logs. In the media recovery phase, Flashback Database applies redo to bring the database up to the standby_became_primary_scn. In order for Flashback Database to succeed, there must be sufficient history available in the Flashback Database logs and all of the redo generated between the restore point and the standby_became_primary_scn must be available. If Flashback Database fails, automatic reinstatement stops and you will have to perform a manual SCN-based recovery to the standby_became_primary_scn and complete the reinstatement.
Once Flashback Database has succeeded, the observer will convert the database to a standby, bounce it, and begin apply services.
startup mount
observer log:
Initiating reinstatement for database "db1_a"... Reinstating database "db1_a", please wait... Operation requires shutdown of instance "db1" on database "db1_a" Shutting down instance "db1"... ORA-01109: database not open Database dismounted. ORACLE instance shut down. Operation requires startup of instance "db1" on database "db1_a" Starting instance "db1"... ORACLE instance started. Database mounted. Continuing to reinstate database "db1_a" ... Reinstatement of database "db1_a" succeeded
dgmgrl status:
Configuration Name: FSF Enabled: YES Protection Mode: MaxAvailability Databases: db1_b - Primary database db1_a - Physical standby database - Fast-Start Failover target Fast-Start Failover: ENABLED Current status for "FSF": SUCCESS
Now test FSFO failover back to the original primary. Make some new changes and verify that they are preserved after failover. Remember to check Flashback Database history before aborting the primary.
Another good test is to simulate network failures that leave the primary up, but isolated from the failover target standby and the observer. When a primary loses contact with both the failover target and the observer simultaneously, it enters a "stalled" state (v$database.fs_failover_status = 'STALLED') and any sessions still connected to the primary will block on commit. Queries and DML will continue to run - only sessions that commit will block. This prevents a "split brain" condition if a failover occurs since none of the changes made to the isolated primary can be made permanent. Oracle Database 10 g databases running versions prior to 10.2.0.4 will remain in a stalled state until aborted or signaled to remain the primary by the observer once connectivity has been restored. Starting with 10.2.0.4 (including all versions of 11g and later), Oracle provides the FastStartFailoverPmyShutdown Broker property that allows you to specify what the primary should do if it is still in a stalled state when the FSFO threshold timeout has elapsed. Setting this property to 'TRUE' (the default) causes the primary to self-terminate. Setting it to 'FALSE' leaves the database open and stalled until it is terminated or signaled to proceed in the event a failover did not take place (e.g. the observer was killed after the stall began, but before the failover timeout had elapsed).
The simple tests described in this guide are fine for making sure the basics are working, but you'll probably want to develop a more comprehensive set of tests suited to your environment and requirements.
There's a big difference between a system that's FSFO enabled and one that's FSFO ready. Being FSFO ready means that all conditions are met for a successful failover, including having a running observer and sufficient redo transmitted to the failover target to meet durability requirements. This section describes how to stay on top of your FSFO environments.
Use Broker's "show configuration" command to determine FSFO status and the "show database <db_unique_name> statusreport" command to drill down for details if Broker reports a problem.
The v$database view has has columns specifically for monitoring FSFO status. The values that indicate FSFO is ready for failover are listed below. See the Oracle Reference and Data Guard Administrator guides for your release for details.
fs_failover_status = 'SYNCHRONIZED' for MaxAvail 'TARGET UNDER LAG LIMIT' for MaxPerf fs_failover_observer_present = 'YES'
Nothing will ruin your day faster than finding out that the standby the observer just failed over to is 12 hours behind in applying redo. The observer is perfectly satisfied if all of the redo it needs to meet your durability requirements has been received by the failover target. It doesn't consider how much of that redo has been applied. Have a means of notifying someone if standby apply falls too far behind.
Slightly less critical than making sure you've got a good primary is making sure the failed primary can be automatically reinstated. Monitoring flashback database history and reacting when it drops below 30 minutes will save you time and improve availability. It will also alert you to databases that have had Flashback Database disabled at some point after FSFO was enabled. Broker checks to see if Flashback Database is enabled on the primary and failover targets when FSFO is enabled. After FSFO is enabled, Broker will continue to check that Flashback Database is enabled during health checks. If it detects that Flashback Database was disabled, either manually, or automatically because Flashback Database discovered a problem, Broker signals "ORA-16827: Flashback Database is disabled".
Failing over the database won't do much good if applications and other database clients don't know where the primary went. If clients are already configured to automatically time out and reconnect if they don't get a response from the database, a simple but effective approach is to use a network alias (e.g. DNS CNAME) that always resolves to the primary. After a role change, the naming service can be updated with the new primary's address. A trigger on the DB_ROLE_CHANGE system event can be used to update the naming service and, with the proper client cache TTL settings, clients can connect to the new primary very quickly.
Oracle also provides Fast Application Notification (FAN) for OCI clients and Fast Connect Failover for JDBC clients. These facilities allow applications written to take advantage of them to receive asynchronous notification of database events, including role transitions.
This section will help you get started with creating a wrapper script to automatically start and restart the FSFO observer. Use the wrapper script to start the observer process when the observer host boots or to restart it if it dies. Writing the wrapper itself and the means to determine when to execute it are up to you.
Create a wallet
While not strictly required, creating a wallet provides a secure way to store the credentials needed to automatically connect to the primary when starting the observer. The "Configuring Authentication" chapter of the Oracle Database Security Guide provides detailed instructions for creating a wallet.
A simple example for *nix is provided below that will work with both releases. The Oracle Database 10 g FSFO observer is limited to using the default username and password defined in the wallet. In 10g, a single wallet can be used for multiple observers, but they must all use the same SYS password. The Oracle Database 11 g observer can make use of specific credentials, allowing the same wallet to be used for multiple observers with different SYS passwords.
Create a directory to store the wallet.
mkdir -p /u01/app/oracle/admin/wallet
Create a wallet and set the default username and password to the database's SYSDBA credentials (usually SYS).
mkstore -wrl /u01/app/oracle/admin/wallet -createEntry oracle.security.client.default_username SYS mkstore -wrl /u01/app/oracle/admin/wallet -createEntry oracle.security.client.default_password <sys password>
Add the wallet location and override to sqlnet.ora.
WALLET_LOCATION = (SOURCE = (METHOD = FILE) (METHOD_DATA = (DIRECTORY = /u01/app/oracle/admin/wallet) ) ) SQLNET.WALLET_OVERRIDE = TRUE
The observer maintains state information in a file. By default the file is named fsfo.dat and is created in the working directory where the observer is started. The state file is locked when the observer is running to prevent multiple observers from using the same file. It's generally a good idea to store the state file in a directory associated with the database to avoid locking issues when running multiple observers on the same host.
mkdir -p /u01/app/oracle/admin/db1/observer
Here's a one-liner observer startup for *nix. Note the use of "/@<tns_alias>" to login using the wallet.
nohup dgmgrl /@db1 "start observer file='/u01/app/oracle/admin/db1/observer/fsfo.dat'" \ >> /u01/app/oracle/admin/db1/observer/dgmgrl.log &
Create a script to automate FSFO failover initiation and use it as your standard method for standby flips. This not only saves time and minimizes problems by automating an otherwise manual process, it exercises your failover and DR procedures with every flip so that you know the FSFO configuration is sound and, in a real emergency, everyone knows what to do. Failovers become routine. In fact, failovers are so reliable, fast, and simple that switchovers become the exception rather than the rule. Just be sure to include a Flashback Database history check in the script to provide an option to abort if a failover would require a manual reinstate.
The DB_ROLE_CHANGE event will fire whenever a database is opened for the first time after a role transition. Create a trigger on this event to perform actions specific to your environment after a switchover or failover, such as updating the name resolution service to point to the new primary. Keep this trigger as simple and reliable as possible, limiting it to only what is absolutely necessary at the moment of role transition, since any failures at this point may affect availability. If there are many actions that need to take place, put them in a separate script and use the trigger to run the script in an orphan process or thread independent of the database.
All standbys other than the failover target are considered bystanders (v$database.fs_failover_status = 'BYSTANDER'). Bystanders are part of the Data Guard configuration, but not part of the FSFO configuration. Only two databases, the primary and the failover target, can be in the FSFO configuration at any given time.
During failover, bystanders "follow" the primary by default, flashing back and reapplying redo from the new primary as necessary. (Yes, bystanders need Flashback Database too).
After a failover, a bystander will not automatically become the new failover target. Unless action is taken to change the failover target to one of the bystanders, the new primary will be without a failover target until the former primary is reinstated as a standby. If the Broker configuration is changed to make a bystander the new failover target (probably a good idea if the failed database will be down for a while), the observer will not automatically reinstate the former primary because it is no longer part of the FSFO configuration. Reinstatement will have to be accomplished by other means (manual or scripted Broker commands).
FSFO enabled configurations having multiple standbys cannot switchover to a standby that is not the failover target. To switchover to a standby that is not the current failover target:
John Smiley [jrsmiley@gmail.com] is a persistent storage architect for a major online retailer.