|Oracle® Data Cartridge Developer's Guide
10g Release 1 (10.1)
Part Number B10800-01
In addition to the efficient and secure management of data ordered under the relational model, Oracle provides support for data organized under the object model. Object types and other features such as large objects (
LOBs), external procedures, extensible indexing, and query optimization can be used to build powerful, reusable server-based components called data cartridges.
This chapter contains these topics:
Data cartridges extend the capabilities of the Oracle server. To do this, they take advantage of the framework of the Oracle Extensibility Architecture.
The framework lets you capture business logic and processes associated with specialized, or domain-specific, data in user-defined datatypes. Data cartridges that provide new behavior without needing new attributes have the option of using packages rather than user-defined types. Either way, you determine how the server interprets, stores, retrieves, and indexes the data. Data cartridges package this functionality, resulting in software components that plug into a server, extending its capabilities into a new domain.
In other words, the database is extensible. You can customize the indexing and query optimization mechanisms of the database management system, providing specialized services or more efficient processing for user-defined business objects and rich types. You use the extensibility interfaces to register your implementations with the server. Registering them causes the server to use your implementations instead its own when doing your specialized processing.
The extensibility interfaces consist of functions that the server calls to execute the custom indexing or optimizing behavior implemented for a data cartridge. The interfaces are defined by Oracle; as the cartridge developer, you must implement the functions (also frequently called interfaces) to embody the specialized behavior you require. In general, you implement the functions as static methods of an object type. An object type that implements the extensible indexing interface is called an indextype; an object type that implements the extensible optimizing interface is called a statistics type.
The key characteristics of data cartridges are:
Data cartridges are server-based. Their constituents reside on the server or are accessed from the server. The processing for data cartridges occurs at the server, or is dispatched from the server in the form of an external procedure.
Data cartridges extend the server. They define new types and behavior, providing capabilities in component form that were previously unavailable in the server. Users of data cartridges can use the new types in their application to get the new behavior. For example, after loading an Image data cartridge, users can define a table
Person with a column
Photo of type
Data cartridges are integrated with the server. The Oracle Extensibility Framework defines a set of interfaces that integrate data cartridges with the components of the server engine. For example, the interface to the indexing engine allows for domain-specific indexing. Optimizer interfaces similarly allow data cartridges to define domain-specific ways of assessing the CPU and I/O cost of accessing cartridge data.
Data cartridges are packaged. A data cartridge is installed as a unit. Once installed, the data cartridge handles all access issues, such as users being in different schemas, having different privileges, and so on.
Most industries have evolved sophisticated models to handle complex data objects that make up the essence of their business. By data objects, we mean both the structures that relate different units of information and the operations that are performed on them.
The simple names given to these data objects often conceal considerable complexity. For example, the banking industry has many different types of bank accounts. Each bank account has customer demographic information, balance information, transaction information, and rules that embody its behavior (deposit, withdrawal, interest accrual, and so forth).
Data cartridges let you encapsulate this business logic in software components that integrate with the Oracle server. For some time it has been possible to add logic to a database using stored procedures. With the addition of the object-relational extension, application programmers and independent software vendors can enhance the Oracle server to support data types, processes, and logic to model business objects.
While business models have developed increasingly complex data objects, information technology has made it necessary to work with new kinds of data, such as satellite images, X-rays, animal sounds, seismic vibrations, chemical models. Complex and multimedia datatypes are now frequently stored and retrieved, queried and analyzed.
Web-based applications routinely include many different kinds of complex data. Including application-specific data types and the associated business logic requires a new class of networked, content-rich, multitiered, distributed applications. Data cartridges help you meet this need by combining scalar and unstructured datatypes in domain-specific components.
Data cartridges are typically domain-specific. Domains are characterized by content and scope.
In terms of content, a data cartridge can accommodate scalar, complex, and multimedia data. Scalar data can be modeled using native SQL types such as
CHAR. Complex data includes matrices, temperature and magnetic grids, and compound documents. Unstructured multimedia data includes such information as video, voice, and image data.
In terms of scope, a data cartridge can have broad horizontal (cross-industry) coverage or it can be specialized for a specific type of business. For example, a data cartridge for general storage and retrieval of textual data is cross-industry in scope; a data cartridge for the storage and retrieval of legal documents for litigation support is industry-specific.
Table 1-1 shows a way of classifying data cartridge domains according to their content (type of data) and scope (cross-industry or industry-specific), with some examples.
Table 1-1 Data Cartridge Domains by Content and Scope
|Content||Scope: Cross-Industry Uses||Scope: Industry-Specific Extensions|
|Scalar Data||Statistical conversion||Financial and Petroleum|
|Multimedia and Complex Unstructured Data||Text||Image|
Oracle lets you use built-in scalar datatypes to construct more complex user-defined types. The Object-Relational Database Management System provides foundational cartridges that package multimedia and complex data, which can be used as bases for applications across many different industries:
Table 1-2 Oracle Cartridges as Bases for Development
|Text||Tokenized serial byte stream||Display, Compress, Reformat, Index...|
|Image||Structured large object||Compress, Crop, Scale, Rotate, Reformat...|
|Spatial||Geometric objects such as points, lines, polygons||Project, Rotate, Transform, Map...|
|Video||Structured large object of serial (dynamic) image data||Compress, Play, Rewind, Pause...|
Another way of viewing the relationship of cartridges to domains is to view basic multimedia datatypes as forming a foundation that can be extended in specific ways by specific industries. For example, Table 1-3 shows cartridges that could be built for medical applications:
Table 1-3 Medicine-Specific Extensions to Basic Cartridges
A cartridge providing basic services can be deployed across many industries; for example, a text cartridge can be utilized within both law and medicine. A cartridge can also leverage domain expertise across an industry; for example, an image cartridge can provide basic functionality for both X-rays and Sonar within medicine. These cartridges can be further extended for more specialized vertical applications. For instance, medical cartridges could be extended by other cartridges:
The Oracle server provides services for basic data storage, query processing, optimization, and indexing. Various applications use these services to access database capabilities. However, data cartridges have specialized needs because they incorporate domain-specific data. To accommodate these specialized applications, the basic services have been made extensible.
That is, where standard Oracle services are not adequate for the processing a data cartridge requires, you as the data cartridge developer can provide services that are specially tuned to your cartridge. Every data cartridge can provide its own implementations of these services.
For example, suppose you want to build a spatial data cartridge for geographic information systems (GIS) applications. In this case, you might need to implement routines that create a spatial index, insert an entry into the index, update the index, delete from the index, and perform other required operations. Thus, you extend the indexing service of the server.
Figure 1-1 shows the standard services implemented by the Oracle server. This section describes some of these services, highlighting major Oracle capabilities as they relate to data cartridge development.
Figure 1-1 Oracle Services
The Oracle universal data server provides both native and extensible type system services. Historically, most applications have focused on accessing and modifying corporate data that is stored in tables composed of native SQL datatypes, such as
CHAR. Oracle adds support for new types, including:
User-defined object types
VARRAY (varying length array)
Internal large object types:
BLOB (binary large object)
CLOB (character large object)
BFILE (external file)
This section discusses these types.
it is defined by a user, usually a cartridge developer
it specifies both the underlying persistent data, attributes, and the related behaviors, methods,
User-defined types extend the modeling capabilities of the native datatypes. With user-defined types, you can make better models of complex entities in the real world by binding data attributes to semantic behaviors.
A user-defined type can have one or more attributes. Each attribute has a name and a type. The type of an attribute can be a native SQL type, a
LOB, a collection, another object type, or a
REF type. The syntax for defining such types is discussed in Chapter 3, " Defining Object Types ".
See Also:Oracle Database Application Developer's Guide - Object-Relational Features for more information on user-defined types
A method is a procedure or a function that is part of a user-defined type definition. Methods can access and manipulate attributes of the related type. Methods can run within the execution environment of the Oracle server. Methods can also be dispatched outside the server as part of the extensible server execution environment.
Collections are SQL datatypes that contain multiple elements. Elements, or values, of a collection are all from the same type hierarchy. In Oracle, collections of complex types can be
VARRAYs or nested tables.
A VARRAY contains a variable number of ordered elements. The
VARRAY datatype can be used for a column of a table or an attribute of an object type. The element type of a
VARRAY can be either a native datatype, such as
NUMBER, or a user-defined type.
A nested table can be created using Oracle SQL to provide the semantics of an unordered collection. As with a
VARRAY, a nested table can define a column of a table or an attribute of a user-defined type.
If you create an object table in Oracle, you can obtain a reference that acts as a database pointer to an associated row object. References are important for navigating among object instances.
REF operator obtains a reference to a row object. Because
REFs rely on the underlying object identity, you can only use
REF with an object stored as a row in an object table or objects composed from an object view.
See Also:PL/SQL User's Guide and Reference for details of the
Oracle provides large object (
LOB) types to handle the storage demands of images, video clips, documents, and other forms of unstructured data. Large objects are stored in a way that optimizes space utilization and provides efficient access. Large objects are composed of locators and the related binary or character data. The
LOB locators are stored in-line with other table columns and, for internal
NCLOB), the data can be in a separate database storage area. For external
BFILE), the data is stored outside the database tablespaces in operating system files. A table can contain multiple
LOB columns (in contrast to the limit of one
RAW column for each table). Each
LOB column can be stored in a separate tablespace, and even on different secondary storage devices.
Oracle SQL data definition language (DDL) extensions let you create, modify, and delete tables and object types that contain
LOBs. The Oracle SQL data manipulation language (DML) includes statements to insert and delete complete
LOBs. There is also an extensive set of statements for piece-wise reading, writing, and manipulating of
LOBs with Java, PL/SQL, and the Oracle Call Interface (OCI) software.
LOB types, both the locators and related data participate fully in the transactional model of the Oracle server. The data for
BFILEs does not participate in transactions; however,
BFILE locators are fully supported by Oracle server transactions.
Unlike scalar quantities, a
LOB value cannot be indexed using built-in indexing schemes. However, you can use the various
LOB APIs to build modules, including methods of user-defined types, to access and manipulate
LOB content. The extensible indexing framework lets you define the semantics of data residing in
LOBs and manipulate the data using these semantics.
Oracle provides a variety of interfaces and environments to access and manipulate
LOBs. The use of
LOBs to store and manipulate binary and character data to represent your domain is discussed Chapter 6, " Working with Multimedia Datatypes".
See Also:Oracle Database Application Developer's Guide - Large Objects for detailed discussions of large objects
The Oracle type system decouples the implementation of a member method for a user-defined type from the specification of the method. Components of an Oracle data cartridge can be implemented using any of the popular programming languages. In Oracle, methods, functions, and procedures can be developed using PL/SQL, external C/C++ language routines, or Java. Thus, the database server runtime environment can be extended by user-defined methods, functions, and procedures.
In Oracle, Java offers data cartridge developers a powerful implementation choice for data cartridge behavior. In addition, PL/SQL offers a data cartridge developer a powerful procedural language that supports all the object extensions for SQL. With PL/SQL, program logic can execute on the server and perform traditional procedural language operations such as loops, if-then-else clauses, and array access.
While PL/SQL and Java are powerful, certain computation-intensive operations such as a Fast Fourier Transform or an image format conversion are handled more efficiently by C programs. With the Oracle Server, you can call C language programs from the server. Such programs are executed in a separate address space from the server. Thus, the database server is insulated from any program failures that might occur in external procedures, preventing the Oracle database from being corrupted by such failures.
With certain reasonable restrictions, external procedures can call back to the Oracle Server using OCI. Callbacks are particularly useful for processing
LOBs. For example, an external procedure can use callbacks to perform piece-wise reads or writes of
LOBs stored in the database. External procedures can also use callbacks to manipulate domain indexes stored as Index-Organized Tables in the database.
Figure 1-2 External Program Executing in Separate Address Space
Basic database management systems support a few types of access methods, such as B+trees and hash indexes, on a limited set of data types, such as numbers and strings. For simple data types such as integers and small strings, all aspects of indexing can easily be handled by the database system. However, as data becomes more complex, including information such as text, spatial, image, video, and audio, it requires complex data types and specialized indexing techniques.
Complex data types have application-specific formats, indexing requirements, and selection predicates. For example, there are many different means of document encoding (ODA, XML, plain text) and information retrieval techniques (keyword, full-text boolean, similarity, probabilistic, and so on). Similarly, R-trees are an efficient method of indexing spatial data. No database server can be built with support for all possible kinds of complex data and indexing. Oracle's solution is to build an extensible server that lets you define the index types you require.
Such user-defined indexes are called domain indexes because they index data in an application-specific domain. The cartridge is responsible for defining the index structure, maintaining the index content during load and update operations, and searching the index during query processing. The physical index can be stored in the Oracle database as tables or externally as a file.
A domain index is a schema object. It is created, managed, and accessed by routines implemented as methods of a user-defined type called an indextype. The routines that an indextype must implement, and the operations the routines must perform, are described in Chapter 8, " Building Domain Indexes". Implementation of the routines is specific to an application and must be done by the cartridge developer.
With extensible indexing, the application
Defines the structure of the domain index
Stores the index data, either inside or outside the Oracle database
Manages, retrieves, and uses the index data to evaluate user queries
When the database system handles the physical storage of domain indexes, data cartridges
Define the format and content of an index. Cartridges define an index structure that can accommodate a complex data object.
Build, delete, and update a domain index. Cartridges handle building and maintaining the index structures. Because indexes are modeled as collections of tuples, in-place updating is directly supported.
Access and interpret the content of an index. Cartridges become an integral component of query processing: content-related clauses for database queries are handled by the data cartridge.
Typical relational and object-relational database management systems do not support extensible indexing. Consequently, many applications maintain file-based indexes for complex data residing in relational database tables. A considerable amount of code and effort is required to:
maintain consistency between external indexes and the related relational data
support compound queries involving tabular values and external indexes
manage the system, performing backup, recovery, storage allocation, and so on, with multiple forms of persistent storage, such as files and databases
By supporting extensible indexes, the Oracle server significantly reduces the level of effort needed to develop solutions involving high-performance access to complex datatypes.
The extensible optimizer lets user-defined functions and indexes collect statistical information such as selectivity and cost functions. This information is used by the optimizer in choosing a query plan. The cost-based optimizer is thus extended to use the user-supplied information; the rule-based optimizer is unchanged.
The optimizer generates an execution plan for a SQL statement. An execution plan includes an access method for each table in the
FROM clause, and an ordering (called the join order) of the tables in the
FROM clause. System-defined access methods include indexes, hash clusters, and table scans. The optimizer chooses a plan by generating a set of join orders or permutations, computing the cost of each, and selecting the one with the lowest cost.
For each table in the join order, the optimizer computes the cost of each possible access method and join method and chooses the one with the lowest cost. The cost of the join order is the sum of the access method and join method costs. The costs are calculated using algorithms that together compose the cost model. A cost model can include varying levels of detail about the physical environment in which the query is executed. The current cost model includes the number of disk accesses and estimates of network costs, with minor adjustments to compensate for the lack of detail.
The optimizer uses statistics about the objects referenced in the query to compute the costs. The statistics are gathered using the
DBMS_STATS package. The optimizer uses these statistics to calculate cost and selectivity. The selectivity of a predicate is the fraction of rows in a table that will be chosen by the predicate. It is a number between 0 and 100 (expressed as percentage).
Extensibility allows users to define new operators, index types, and domain indexes. For such user-defined operators and domain indexes, the extensible optimizer lets users control the three main components used by the optimizer to select an execution plan: statistics, selectivity, and cost.
See Also:PL/SQL Packages and Types Reference for information about
There are three classes of extensibility interfaces:
Cartridge basic service interfaces
Data cartridge interfaces
The DBMS interfaces are the simplest kind of extensibility services. DBMS interfaces are made available through extensions to SQL or to the Oracle Call Interface (OCI). For example, the extensible type manager uses the
TYPE syntax in SQL. Similarly, extensible indexing uses DDL and DML support for specifying and manipulating indexes.
Generic interfaces provide basic services like memory management, context management, internationalization, and cartridge-specific management. These cartridge basic interface services are used by data cartridges to implement behavior for new datatypes in the context of the server's execution environment. These services provide helper routines that make it easy for data cartridge developers to write robust, portable server-side methods.
When processing user-defined indextypes, Oracle calls data cartridge functions to perform operations such as index searches or fetch operations. For user-defined query optimization, the query optimizer calls functions implemented by the data cartridge to compute the cost of user-defined operators or functions.