|Oracle® Database High Availability Architecture and Best Practices
10g Release 1 (10.1)
Part Number B10726-02
This chapter describes Oracle Database high availability features. It includes the following topics:
Oracle Real Application Clusters (RAC) enables multiple instances that are linked by an interconnect to share access to an Oracle database. This enables RAC to provide high availability, scalability, and redundancy during failures. RAC provides scalability without requiring application code changes.
RAC accommodates all system types, from read-only data warehouse (DSS) systems to update-intensive online transaction processing (OLTP) systems as well as systems that combine both DSS and OLTP. Typical RAC environments are configured with symmetric multi-processors.
Node and instance failover in seconds
Integrated and intelligent connection and service failover across various instances
Planned node, instance, and service switchover and switchback
Rolling patch upgrades
Multiple active instance availability and scalability across multiple nodes
Comprehensive manageability integrating database and cluster features
Oracle Data Guard provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to survive failures, disasters, errors, and data corruptions. Data Guard maintains these standby databases as transactionally consistent copies of the production database. Then, if the production database becomes unavailable because of a planned or an unplanned outage, Data Guard can switch any standby database to the production role, thus minimizing the downtime associated with the outage. Data Guard can be used with traditional backup, restoration, and cluster technology to provide a high level of data protection and data availability.
A Data Guard configuration consists of one production database and one or more physical or logical standby databases. The databases in a Data Guard configuration are connected by Oracle Net and may be dispersed geographically. There are no restrictions on where the databases are located if they can communicate with each other. For example, you can have a standby database in the same building as your primary database to help manage planned downtime and two or more standby databases in other locations for use in disaster recovery.
Data Guard provides the following benefits:
Disaster recovery, data protection and high availability
Data Guard provides an efficient and comprehensive disaster recovery and high availability solution. Easy-to-manage switchover and failover capabilities allow role reversals between primary and standby databases, minimizing the downtime of the primary database for planned and unplanned outages.
Complete data protection
With standby databases, Data Guard guarantees no data loss, even in the face of unforeseen disasters. A standby database provides a safeguard against data corruption and user errors. Storage level physical corruptions on the primary database do not propagate to the standby database. Similarly, logical corruptions or user errors that cause the primary database to be permanently damaged can be resolved. Finally, the redo data is validated when it is applied to the standby database.
Efficient use of system resources
The standby database tables that are updated with redo data received from the primary database can be used for other tasks such as backups, reporting, summations, and queries, thereby reducing the primary database workload necessary to perform these tasks, saving valuable CPU and I/O cycles. With a logical standby database, users can perform normal data manipulation on tables in schemas that are not updated from the primary database. A logical standby database can remain open while the tables are updated from the primary database, and the tables are simultaneously available for read-only access. Finally, additional indexes and materialized views can be created on the maintained tables for better query performance and to suit specific business requirements.
Flexibility in data protection to balance availability against performance requirements
Data Guard offers maximum protection, maximum availability, and maximum performance modes to help enterprises balance data availability against system performance requirements.
Automatic gap detection and resolution
If connectivity is lost between the primary and one or more standby databases (for example, due to network problems), redo data being generated on the primary database cannot be sent to those standby databases. After connectivity is reestablished, the missing log files (referred to as a gap) are automatically detected by Data Guard, which then automatically transmits the missing log files to the standby databases. The standby databases are resynchronized with the primary database, without manual intervention by the DBA.
Centralized and simple management
The Data Guard broker provides a graphical user interface and a command-line interface to automate management and operational tasks across multiple databases in a Data Guard configuration. The broker also monitors all of the systems within a single Data Guard configuration.
Integration with the Oracle database
Oracle Streams enables the propagation and management of data, transactions, and events in a data stream, either within a database or from one database to another. Streams provides a set of elements that allow users to control what information is put into a data stream, how the stream is routed from node to node, what happens to events in the stream as they flow into each node, and how the stream terminates.
Streams can be used to replicate a database or a subset of a database. This enables users and applications to simultaneously update data at multiple locations. If a failure occurs at one of the locations, then users and applications at the surviving sites can continue to access and update data.
Streams can be used to build distributed applications that replicate changes at the application level using message queuing. If an application fails, then the surviving applications can continue to operate and provide access to data through locally maintained copies.
Streams provides granularity and control over what is replicated and how it is replicated. It supports bidirectional replication, data transformations, subsetting, custom apply functions, and heterogeneous platforms. It also gives users complete control over the routing of change records from the primary database to a replica database.
Database administrators can perform a variety of online operations to table definitions, including online reorganization of heap-organized tables. This makes it possible to reorganize a table while users have full access to it.
This online architecture provides the following benefits:
Any physical attribute of the table can be changed online. The table can be moved to a new location. The table can be partitioned. The table can be converted from one type of organization (such as heap-organized) to another (such as index-organized).
Many logical attributes can also be changed. Column names, types, and sizes can be changed. Columns can be added, deleted, or merged. One restriction is that the primary key of the table cannot be modified.
Secondary indexes on index-organized tables can be created and rebuilt online. Secondary indexes support efficient use of block hints (physical guesses). Invalid physical guesses can be repaired online.
Indexes can be created online and analyzed at the same time. Online repair of the physical guess component of logical rowids (used in secondary indexes and in the mapping table for index-organized tables) also can be used.
The physical guess component of logical rowids stored in secondary indexes on index-organized tables can be repaired. This enables online repair of invalid physical guesses.
You can reorganize an index-organized table or an index-organized table partition without rebuilding its secondary indexes. This results in a short reorganization maintenance window.
You can reorganize an index-organized table online. Along with online reorganization of secondary indexes, this capability eliminates the reorganization maintenance window.
Moving data using transportable tablespaces can be much faster than performing either an export/import or unload/load of the same data. This is because transporting a tablespace requires only the copying of datafiles and integrating the tablespace structural information. You can also use transportable tablespaces to move index data, thereby avoiding the index rebuilds you would have to perform when importing or loading table data.
You can transport tablespaces across platforms. This functionality can be used to:
Provide an easier and more efficient means for content providers to publish structured data and distribute it to customers running Oracle on a different platform
Simplify the distribution of data from a data warehouse environment to data marts which are often running on smaller systems with a different platform
Enable the sharing of read-only tablespaces across a heterogeneous cluster
Allow a database to be migrated from one platform to another
Most platforms are supported for cross-platform tablespace transport. You can query the
V$TRANSPORTABLE_PLATFORM view to see the platforms that are supported and to determine their platform IDs and their endian format (byte ordering).
Automatic storage management (ASM) automates and simplifies the optimal layout of datafiles, control files, redo log files, and flash recovery area files for both single instance and RAC databases. ASM is designed to work with any type of storage, from unmanaged disks to a SAN-based, intelligent storage array.
ASM maximizes performance by automatically distributing database files across all available disks. Database storage is automatically rebalanced whenever the storage configuration changes while the database remains online. You never need to manually relocate data to reclaim space because this approach eliminates storage fragmentation.
ASM provides data protection by maintaining redundant copies, or mirrors, of data.The protection and striping policy can be defined for each file to allow varying degrees of protection striping within the same set of disks.
ASM disk groups, which are comprised of disks and the files that reside on them, simplify storage administration by allowing a collection of disks to be managed as a single unit. ASM failure groups allow the disks in a disk group to be subdivided into sets of disks that share a common resource whose failure needs to be tolerated. An example of a failure group is a string of SCSI disks connected to a common SCSI controller.
Human errors are difficult to avoid and can be particularly difficult to recover from without aplanning and the right technology. Such errors can result in logical data corruption or cause downtime of one or more components of the IT infrastructure. While it is relatively simple to rectify the failure of an individual component, detection and repair of logical data corruption, such as accidental deletion of valuable data, is a time-consuming operation that causes enormous loss of business productivity.
Flashback technology provides a set of features to view and rewind data back and forth in time. The flashback features offer the capability to query past versions of schema objects, query historical data, perform change analysis. and perform self-service repair to recover from logical corruptions while the database is online.
Flashback technology provides a SQL interface to quickly analyze and repair human errors. Flashback provides fine-grained analysis and repair for localized damage such as deleting the wrong customer order. Flashback technology also enables correction of more widespread damage, yet does it quickly to avoid long downtime. Flashback technology is unique to the Oracle Database and supports recovery at all levels including the row, transaction, table, tablespace, and database.
Oracle Flashback Query enables you to specify a target time and then run queries against the database, viewing results as they would have appeared at that time. To recover from an unwanted change like an erroneous update to a table, you can choose a target time before the error and run a query to retrieve the contents of the lost rows.
Oracle Flashback Version Query retrieves metadata and historical data for a specific time interval. You can view all the rows of a table that ever existed during a specific time interval. Metadata about the different versions of rows includes start and end time, type of change operation, and identity of the transaction that created the row version.
Oracle Flashback Transaction Query retrieves metadata and historical data for a specific transaction or for all transactions within a specific time interval. You can also obtain the SQL code to undo the changes to particular rows affected by a transaction. You typically use Flashback Transaction Query with Flashback Version Query, which provides the transaction IDs for the rows of interest.
Oracle Flashback Table recovers a table to its state at a previous point in time. You can restore table data while the database is online, undoing changes to only the specified table.
Oracle Flashback Drop recovers a dropped table. This reverses the effects of a
DROP TABLE statement.
Oracle Flashback Database provides a more efficient alternative to database point-in-time recovery. When you use Flashback Database, your current datafiles revert to their contents at a past time. The result is much like the result of a point-in-time recovery using datafile backups and redo logs, but you do not have to restore datafiles from backup, and you do not have to re-apply as many individual changes in the redo logs as you would have to do in conventional media recovery.
The Oracle database includes several features that enable changes to be made to the instance configuration dynamically. For example, the dynamic SGA infrastructure can be used to alter an instance's memory usage. It enables the size of the buffer cache, the shared pool, the large pool, and the process-private memory to be changed without shutting down the instance. Oracle also provides transparent management of working memory for SQL execution by self-tuning the initialization runtime parameters that control allocation of private memory.
Another type of dynamic reconfiguration occurs when Oracle polls the operating system to detect changes in the number of available CPUs and reallocates internal resources.
In addition, some initialization parameters can be changed without shutting down the instance. You can use the
ALTER SESSION statement to change the value of a parameter during a session. You can use the
ALTER SYSTEM statement to change the value of a parameter in all sessions of an instance for the duration of the instance.
Oracle Fail Safe is a software option that works with Microsoft Cluster Server (MSCS) to provide highly available business solutions on Microsoft clusters. A Microsoft cluster is a configuration of two or more Windows systems that appears to network users as a single, highly available system.
Oracle Fail Safe works with MSCS cluster software to provide high availability for applications and single-instance databases running on a cluster. When a node fails, the cluster software fails over to the surviving node based on parameters that you configure using Oracle Fail Safe.
With Oracle Fail Safe, you can reduce downtime for single-instance Oracle databases, Oracle HTTP servers, and almost any application that can be configured as a Windows service.
See Also:Oracle Fail Safe documentation at
Database backup, restoration, and recovery are essential processes underlying any high availability system. Imagine the potential for lost revenue, customer dissatisfaction, and unrecoverable information caused by a disk failure or human error. A well-designed and well-implemented backup and recovery strategy is a cornerstone for every database deployment, making it possible to restore and recover all or part of a database without data loss.
Recovery Manager (RMAN) is Oracle's utility to manage the backup and, more importantly, the recovery of the database. It eliminates operational complexity while providing superior performance and availability of the database.
Recovery Manager determines the most efficient method of executing the requested backup, restoration, or recovery operation and then executes these operations with the Oracle database server. Recovery Manager and the server automatically identify modifications to the structure of the database and dynamically adjust the required operation to adapt to the changes.
RMAN provides the following benefits:
Block media recovery enables the datafile to remain online while fixing the block corruptions.
Persistent RMAN configurations simplify backup and recovery operations.
Retention policy ensures that relevant backups are retained.
Resumable backup backs up files that were not backed up in a previous, failed backup operation
Resumable restore decreases recovery time by restoring only files that require recovery.
Backup optimization backs up only files that require a backup and have never been backed up.
Restore optimization takes the guesswork out of restoring datafiles and archive logs. RMAN restores only if it is required.
Automatic backup of the control file and the server parameter file means that backup metadata is available in times of database structural changes as well as media failure and disasters.
Reporting features that answer the question, "Is my database recoverable?".
Multiple block size backup support
Online backup does not require the database to be placed into hot backup mode.
You can perform fast incremental backups.
You can reorganize and manage all recovery-related files such as backups, archived redo logs, and flashback logs.
You can merge RMAN incremental backups and image copies in the backup to provide up to date recoverability.
The flash recovery area is a unified storage location for all recovery-related files and activities in an Oracle database. By defining one initialization parameter, all RMAN backups, archive logs, control file autobackups, and datafile copies are automatically written to a specified file system or automatic storage management disk group.
Making a backup to disk is faster because using the flash recovery area eliminates the bottleneck of writing to tape. More importantly, if database media recovery is required, then datafile backups are readily available. Restoration and recovery time is reduced because you do not need to find a tape and a free tape device to restore the needed datafiles and archive logs.
The flash recovery area provides:
Unified storage location of related recovery files
Management of the disk space allocated for recovery files
Simplified database administration tasks
Reliability because the inherent reliability of the disk
Oracle has introduced the Hardware Assisted Resilient Data (HARD) Initiative, which is a program designed to prevent data corruptions before they happen. Data corruptions are very rare, but when they happen, they can have a catastrophic effect on a database, and therefore a business.
Under the HARD Initiative, Oracle continues to work with selected system and storage vendors to build operating system and storage components that can detect corruptions early and prevent corrupted data from being written to disk. The kay approach is block checking where the storage subsystem validates the Oracle block contents. Implementation of this feature is transparent to the end user or DBA, regardless of the hardware vendor.
See Also:Appendix A, "Hardware Assisted Resilient Data (HARD) Initiative"