|Oracle® XML DB Developer's Guide
10g Release 1 (10.1)
Part Number B10790-01
This chapter provides some preliminary design criteria for consideration when planning your Oracle XML DB solution.
This chapter contains these topics:
Oracle XML DB is installed as part of the general purpose database shipped with Oracle Database. For a manual installation or de-installation of Oracle XML DB, see Appendix A, " Installing and Configuring Oracle XML DB".
Oracle XML DB is suited for any application where some or all of the data processed by the application is represented using XML. Oracle XML DB provides for high performance ingestion, storage, processing and retrieval of XML data. Additionally, it also provides the ability to quickly and easily generate XML from existing relational data.
The type of applications that Oracle XML DB is particularly suited to include:
Business-to-Business (B2B) and Application-to-Application (A2A) integration
A typical Oracle XML DB application has one or more of the following requirements and characteristics:
Large numbers of XML documents must be ingested or generated
Large XML documents need to be processed or generated
High performance searching, both within a document and across a large collections of documents
High Levels of security. Fine grained control of security
Data processing must be contained in XML documents and data contained in traditional relational tables
Uses languages such as Java that support open standards such as SQL, XML, XPath, and XSLT
Accesses information using standard Internet protocols such as FTP, HTTP/WebDAV, or JDBC
Full queriability from SQL and integration with analytic capabilities
Validation of XML documents is critical
Oracle XML DB provides you with the ability to fine tune how XML documents will be stored and processed in Oracle Database. Depending on the nature of the application being developed, XML storage must have at least one of the following features
High performance ingestion and retrieval of XML documents
High performance indexing and searching of XML documents
Be able to update sections of an XML document
Manage highly either or both structured and non-structured XML documents
This section discusses the preliminary design criteria you can consider when planning your Oracle XML DB application. Figure 2-1 provides an overview of your main design options for building Oracle XML DB applications.
Will your data be highly structured (mostly XML), semi- structured (pseudo- structured), or mostly non-structured? If highly structured, will your table(s) be XML schema-based or non-schema-based? See "Oracle XML DB Application Design: a. How Structured Is Your Data?" and Chapter 3, " Using Oracle XML DB".
How will other applications and users access your XML and other data? How secure must the access be? Do you need versioning? See "Oracle XML DB Application Design: b. Access Models".
In which language(s) will you be programming your application? See "Oracle XML DB Application Design: c. Application Language".
Will you need to generate XML? See Chapter 15, " Generating XML Data from the Database".
How often will XML documents be accessed, updated, and manipulated? Will you need to update fragments or the whole document?
Will you need to transform the XML to HTML, WML, or other languages, and how will your application transform the XML? See Chapter 8, " Transforming and Validating XMLType Data".
Does your application need to be primarily database resident or work in both database and middle tier?
Is your application data-centric, document- and content-centric, or integrated (is both data- and document-centric). "Oracle XML DB Application Design: d. Processing Models".
Will you be exchanging XML data with other applications, across gateways? Will you need Advanced Queuing (AQ) or SOAP compliance? See Chapter 29, " Exchanging XML Data With Oracle Streams AQ ".
How and where will you store the data, XML data, XML schema, and so on? See "Oracle XML DB Design: e. Storage Models".
Note:Your choice of which models to choose in the preceding four categories, a through d, are typically related to each other. However, the storage model you choose is orthogonal to the choices you make for the other design models. In other words, choices you make for the other design modeling options are not dependent on the storage model option you choose.
Figure 2-1 Oracle XML DB Design Options
Figure 2-2 shows the following data structure categories and associated suggested storage options:
Structured data. Is your data highly structured? In other words, is your data mostly XML data?
Semi/pseudo-structured data. Is your data semi/pseudo-structured? In other words does your data include some XML data?
Unstructured data. Is your data unstructured? In other words, is your data mostly non-XML data?
Also consider the following data modeling questions:
If your application is XML schema-based:
For structured data, you can use either Character Large Object (
CLOB) or structured storage.
For semi- or pseudo-structured data, you can use either
CLOB, structured, or hybrid storage. Here your XML schema can be more loosely coupled. See also "Oracle XML DB Design: e. Storage Models".
For unstructured data, an XML schema design is not applicable.
If your application is non-schema-based. For structured, semi/ pseudo-structured, and unstructured data, you can store your data in either
CLOB values in
XMLType tables or views or in files in repository folders. With this design you have many access options including path- and query-based access through Resource Views.
Figure 2-2 Data Storage Models: How Structured Is Your Data?
Figure 2-3 shows the two main data access modes to consider when designing your Oracle XML DB applications:
Navigation- or path-based access. This is suitable for both content/document and data oriented applications. Oracle XML DB provides the following languages and access APIs:
SQL access through Resource/Path Views. See Chapter 20, " SQL Access Using RESOURCE_VIEW and PATH_VIEW ".
PL/SQL access through
DBMS_XDB. See Chapter 21, " PL/SQL Access and Management of Data Using DBMS_XDB".
Protocol-based access using HTTP/WebDAV or FTP, most suited to content-oriented applications. See Chapter 24, " FTP, HTTP, and WebDAV Access to Repository Data".
Query-based access. This can be most suited to data oriented applications. Oracle XML DB provides access using SQL queries through the following APIs:
These options for accessing repository data are also discussed in Chapter 18, " Accessing Oracle XML DB Repository Data".
You can also consider the following access model criteria:
What level of security do you need? See Chapter 23, " Oracle XML DB Resource Security".
Do you need to version the data? If yes, see Chapter 19, " Managing Oracle XML DB Resource Versions".
Figure 2-3 Data Access Models: How Will Users or Applications Access the Data?
You can program your Oracle XML DB applications in the following languages:
Java (JDBC, Java Servlets)
The following processing options are available and should be considered when designing your Oracle XML DB application:
XSLT. Will you need to transform the XML to HTML, WML, or other languages, and how will your application transform the XML? While storing XML documents in Oracle XML DB you can optionally ensure that their structure complies (is "valid" against) with specific XML Schema. See Chapter 8, " Transforming and Validating XMLType Data".
DOM. See Chapter 10, " PL/SQL API for XMLType ". Use object-relational columns, VARRAYs, nested tables, as well as LOBs to store any element or Element-subtree in your XML Schema, and still maintain DOM fidelity (DOM stored == DOM retrieved).
Note:If you choose the
XPath searching. You can use XPath syntax embedded in a SQL statement or as part of an HTTP request to query XML content in the database. See Chapter 4, " XMLType Operations ", Chapter 9, " Full Text Search Over XML", Chapter 18, " Accessing Oracle XML DB Repository Data", and Chapter 20, " SQL Access Using RESOURCE_VIEW and PATH_VIEW ".
XML Generation and
XMLType views. Will you need to generate or regenerate XML? If yes, see Chapter 15, " Generating XML Data from the Database".
How often will XML documents be accessed, updated, and manipulated? See Chapter 4, " XMLType Operations " and Chapter 20, " SQL Access Using RESOURCE_VIEW and PATH_VIEW ".
Will you need to update fragments or the whole document? You can use XPath to specify individual elements and attributes of your document during updates, without rewriting the entire document. This is more efficient, especially for large XML documents. Chapter 5, " XML Schema Storage and Query: The Basics".
Is your application data-centric, document- and content-centric, or integrated (is both data- and document-centric)? See Chapter 3, " Using Oracle XML DB".
Advanced Queuing (AQ) supports XML and
XMLType applications. You can create queues with payloads that contain
XMLType attributes. These can be used for transmitting and storing messages that contain XML documents. By defining Oracle Database objects with
XMLType attributes, you can do the following:
Store more than one type of XML document in the same queue. The documents are stored internally as CLOBs.
Define transformations to convert Oracle Database objects to
Define rule-based subscribers that query message content using
XMLType operators such as
Figure 2-4 summarizes the Oracle XML DB storage options with regards to using
XMLType tables or views. If you have existing or legacy relational data, use
Regardless of which storage options you choose for your Oracle XML DB application, Oracle XML DB provides the same functionality. However, the option you choose will affect your application's performance and the data fidelity (data accuracy).
Currently, the three main storage options for Oracle XML DB applications are:
LOB-based storage? LOB-based storage assures complete textual fidelity including preservation of whitespace. This means that if you store your XML documents as
CLOB values, when the XML documents are retrieved there will be no data loss. Data integrity is high, and the cost of regeneration is low.
Structured storage? Structured storage loses whitespace information but maintains fidelity to the XML DOM, namely DOM stored = DOM retrieved. This provides:
Better SQL 'queriability' with improved performance
Hybrid or semi-structured storage. Hybrid storage is a special case of structured storage in which a portion of the XML data is broken up into a structured format and the remainder of the data is stored as a
The storage options are totally independent of the following criteria:
Data queryability and updatability, namely, how and how often the data is queried and updated.
How your data is accessed. This is determined by your application processing requirements.
What language(s) your application uses. This is also determined by your application processing requirements.
If you are using
XMLType tables you can store your data in:
CLOB (unstructured) storage
Hybrid or semi-structured storage
XMLType views if you have existing relational data. You can use the following options to define the
SQLX operators. Using these operators you can store the data in relational tables and also generate/regenerate the XML. See Chapter 15, " Generating XML Data from the Database".
Object constructors. You can store the data in relational tables using object constructors.
Figure 2-4 Structured Storage Options
One objection to using XML to represent data is that it generates higher overhead than other representations. Oracle XML DB incorporates a number of features specifically designed to address this issue by significantly improving the performance of XML processing. These are described in the following sections:
Surveys show that data represented in XML and stored in a text file is three times the size of the same data in a Java object or in relational tables. There are two reasons for this:
Tag names (metadata describing the data) and white space (formatting characters) take up a significant amount of space in the document, particularly for highly structured, data-centric XML.
All data in an XML file is represented in human readable (string) format.
The string representation of a numeric value needs about twice as many bytes as the native (binary) representation When XML documents are stored in Oracle XML DB using the structured storage option, the 'shredding' process discards all tags and white space in the document.
The amount of space saved by this optimization depends on the ratio of tag names to data, and the number of collections in the document. For highly-structured, data-centric XML the savings can be significant. When a document is printed, or when node-based operations such as XPath evaluations take place, Oracle XML DB uses the information contained in the associated XML Schema to dynamically reconstruct any necessary tag information.
Document Object Model (DOM) is the dominant programming model for XML documents. DOM APIs are easy to use but the DOM Tree that underpins them is expensive to generate, in terms of memory. A typical DOM implementation maintains approximately 80 to 120 bytes of system overhead for each node in the DOM tree. This means that for highly structured data, the DOM tree can require 10 to 20 times more memory than the document on which it is based.
A conventional DOM implementation requires the entire contents of an XML document to be loaded into the DOM tree before any operations can take place. If an application only needs to process a small percentage of the nodes in the document, this is extremely inefficient in terms of memory and processing overhead. The alternative SAX approach reduces the amount of memory required to process an XML document, but its disadvantage is that it only allows linear processing of nodes in the XML Document.
Oracle XML DB reduces memory overhead associated with DOM programming by managing XML schema-based XML documents using an internal in-memory structure called an XML Object (XOB). A XOB is much smaller than the equivalent DOM since it does not duplicate information like tag names and node types, that can easily be obtained from the associated XML schema. Oracle XML DB automatically uses a XOB whenever an application works with the contents of a schema-based
XMLType. The use of the XOB is transparent to you. It is hidden behind the
XMLType datatype and the C, PL/SQL, and Java APIs.
The XOB can also reduce the amount of memory required to work with an XML document using the Lazily-Loaded Virtual DOM feature. This allows Oracle XML DB to defer loading in-memory representation of nodes that are part of sub-elements or collection until methods attempt to operate on a node in that object. Consequently, if an application only operates on a few nodes in a document, only those nodes and their immediate siblings are loaded into memory.The XOB can only used when an XML document is based on an XML schema. If the contents of the XML document are not based on an XML schema, a traditional DOM is used instead of the XOB.
To populate a DOM tree the application must parse the XML document. The process of creating a DOM tree from an XML file is very CPU- intensive. In a typical DOM-based application, where the XML documents are stored as text, every document has to be parsed and loaded into the DOM tree before the application can work with it. If the contents of the DOM tree are updated the whole tree has to be serialized back into a text format and written out to disk.
Oracle XML DB eliminates the need to keep re-parsing documents. Once an XML document has been stored using structured storage techniques no further parsing is required when the document is loaded from disk into memory. Oracle XML DB is able to map directly between the on disk format and in-memory format using information derived from the associated XML schema. When changes are made to the contents of a schema-based
XMLType, Oracle XML DB is able to write just the updated data back to disk.Again, when the contents of the XMLType are not based on an XML schema a traditional DOM is used instead.
Most DOM implementations use string comparisons when searching for a particular node in the DOM tree. Even a simple search of a DOM tree can require hundreds or thousands of instruction cycles. Searching for a node in a XOB is much more efficient than searching for a node in a DOM. A XOB is based on a computed offset model, similar to a C/C++ object, and uses dynamic hashtables rather than string comparisons to perform node searches.
Making use of the powerful features associated with XML schema in a conventional XML application can generate significant amounts of additional overhead. For example, before an XML document can be validated against an XML schema, the schema itself must be located, parsed, and validated.
Oracle XML DB minimizes the overhead associated with using XML schema. When an XML schema is registered with the database it is loaded in the Oracle XML DB schema cache, along with all of the metadata required to map between the XML, XOB and on disk representations of the data. This means that once the XML schema has been registered with the database, no additional parsing or validation of the XML schema is required before it can be used. The schema cache is shared by all users of the database. Whenever an Oracle XML DB operation requires information contained in the XML schema it can access the required information directly from the cache.
Some operations, such as performing a full schema validation, or serializing an XML document back into text form can still require significant memory and CPU resources. Oracle XML DB allows these operations to be off-loaded to the client or middle tier processor. Oracle Call Interface (OCI) interface and thick JDBC driver both allow the XOB to be managed by the client.The cached representation of the XML schema can also be downloaded to the client. This allows operations such as XML printing, and XML schema validation to be performed using client or middle tier resources, rather than server resources.
Another bottleneck for XML-based Java applications happens when parsing an XML file. Even natively compiled or JIT compiled Java performs XML passing operations twice as slowly compared to using native C language. One of the major performance bottlenecks in implementing XML applications is the cost of transforming data in an XML document between text, Java, and native server representations. The cost of performing these transformations is proportional to the size and complexity of the XML file and becomes severe even in moderately sized files.
Oracle XML DB addresses these issues by implementing all of the Java and PL/SQL interfaces as very thin facades over a native 'C' implementation. This provides for language-neutral XML support (Java, C, PL/SQL, and SQL all use the same underlying implementation), as well as the higher performance XML parsing and DOM processing.
One of the biggest bottlenecks when using Java and XML is with type conversions. Internally Java uses UCS-2 to represent character data. Most XML files and databases do not contain UCS-2 encoded data. This means that all data contained in an XML file has to be converted from 8 Bit or UTF8 encoding to UCS-2 encoding before it can be manipulated in a Java program.
Oracle XML DB addresses these problems with lazy type conversions. With lazy type conversions the content of a node is not converted into the format required by Java until the application attempts to access the contents of the node. Data remains in the internal representation till the last moment. Avoiding unnecessary type conversions can result in significant performance improvements when an application only needs to access a few nodes in an XML document.
Consider a JSP that loads a name from the Oracle Database and prints it out in the generated HTML output. Typical JSP implementations read the name from the database (that probably contains data in the ASCII or ISO8859 character sets) convert the data to UCS-2, and return it to Java as a string. The JSP would not look at the string content, but only print it out after printing the enclosing HTML, probably converting back to the same ASCII or ISO8859 for the client browser. Oracle XML DB provides a write interface on
XMLType so that any element can write itself directly to a stream (such as a
ServletOutputStream) without conversion through Java character sets. Figure 2-5 shows the Oracle XML DB Application Program Interface (API) stack.
Figure 2-5 Oracle XML DB Application Program Interface (API) Stack