|Oracle® Database Concepts
10g Release 1 (10.1)
Part Number B10743-01
This chapter provides an overview of Oracle's content management features.
This chapter contains the following topics:
Oracle Database includes datatypes to handle all the types of rich Internet content such as relational data, object-relational data, XML, text, audio, video, image, and spatial. These datatypes appear as native types in the database. They can all be queried using SQL. A single SQL statement can include data belonging to any or all of these datatypes.
As applications evolve to encompass increasingly richer semantics, they encounter the need to deal with the following kinds of data:
Simple structured data
Complex structured data
Traditionally, the relational model has been very successful at dealing with simple structured data -- the kind which can fit into simple tables. Oracle added object-relational features so that applications can deal with complex structured data -- collections, references, user-defined types and so on. Queuing technologies, such as Oracle Streams Advanced Queuing, deal with messages and other semi-structured data. This chapter discusses Oracle's technologies to support unstructured data.
Unstructured data cannot be decomposed into standard components. Data about an employee can be 'structured' into a name (probably a character string), an identification (likely a number), a salary, and so on. But if you are given a photo, you find that the data really consists of a long stream of 0s and 1s. These 0s and 1s are used to switch pixels on or off, so that you see the photo on a display, but it cannot be broken down into any finer structure in terms of database storage.
Unstructured data such as text, graphic images, still video clips, full motion video, and sound waveforms tend to be large -- a typical employee record may be a few hundred bytes, but even small amounts of multimedia data can be thousands of times larger. Some multimedia data may reside on operating system files, and it is desirable to access them from the database.
Extensible Markup Language (XML) is a tag-based markup language that lets developers create their own tags to describe data that's exchanged between applications and systems over the Internet. XML is widely adopted as the common language of information exchange between companies. One reason for its popularity is its ease of use: XML documents and XML-based messages can be sent easily over the Internet using common protocols, such as HTTP or FTP.
Oracle XML DB treats XML as a native datatype in the database. Oracle XML DB is not a separate server. The XML data model encompasses both unstructured content and structured data. Applications can use standard SQL and XML operators to generate complex XML documents from SQL queries and to store XML documents.
Oracle XML DB provides new capabilities for both content-oriented and data-oriented access. For developers who see XML as documents (news stories, articles, and so on), Oracle XML DB provides an XML repository accessible from standard protocols and SQL.
For others, the structured-data aspect of XML (invoices, addresses, and so on) is more important. For these users, Oracle XML DB provides a native XMLType, support for XML Schema, XPath, XSL-T, DOM, and so on. The data oriented access is typically more query-intensive.
The Oracle XML developer's kits (XDK) contain the basic building blocks for reading, manipulating, transforming, and viewing XML documents. They are available for Java, JavaBeans, C, C++, and PL/SQL. Unlike many shareware and trial XML components, the production Oracle XDKs are fully supported and come with a commercial redistribution license. Oracle XDKs consist of the following components:
XML Parsers: supporting Java, C, C++, and PL/SQL, the components create and parse XML using industry standard DOM and SAX interfaces.
XSLT Processor: transforms or renders XML into other text-based formats, such as HTML.
XML Schema Processor: supporting Java, C, and C++, allows use of XML simple and complex datatypes.
XML Class Generator: automatically generates Java and C++ classes from DTDs and schemas to send XML data from Web forms or applications.
XML Transviewer Java Beans: visually view and transform XML documents and data with Java components.
XML SQL Utility: supporting Java, generates XML documents, DTDs, and schemas from SQL queries.
XSQL Servlet: combines XML, SQL, and XSLT in the server to deliver dynamic Web content.
The large object (LOB) datatypes
BFILE enable you to store and manipulate large blocks of unstructured data (such as text, graphic images, video clips, and sound waveforms) in binary or character format. They provide efficient, random, piece-wise access to the data.
With the growth of the internet and content-rich applications, it has become imperative that the database support a datatype that fulfills the following:
Can store unstructured data
Is optimized for large amounts of such data
Provides a uniform way of accessing large unstructured data within the database or outside
See Also:"Overview of LOB Datatypes"
Oracle Text indexes any document or textual content to add fast, accurate retrieval of information to internet content management applications, e-Business catalogs, news services, job postings, and so on. It can index content stored in file systems, databases, or on the Web.
Oracle Text allows text searches to be combined with regular database searches in a single SQL statement. It can find documents based on their textual content, metadata, or attributes. The Oracle Text SQL API makes it simple and intuitive to create and maintain Text indexes and run Text searches.
Oracle Text is completely integrated with the Oracle database, making it inherently fast and scalable. The Text index is in the database, and Text queries are run in the Oracle process. The Oracle optimizer can choose the best execution plan for any query, giving the best performance for ad hoc queries involving Text and structured criteria. Additional advantages include the following:
Oracle Text supports multilingual querying and indexing.
You can index and define sections for searching in XML documents. Section searching lets you narrow down queries to blocks of text within documents. Oracle Text can automatically create XML sections for you.
A Text index can span many Text columns, giving the best performance for Text queries across more than one column.
Oracle Text has enhanced performance for operations that are common in Text searching, like count hits.
Oracle Text leverages scalability features, such as replication.
Oracle Text supports local partitioned index.
There are three Text index types to cover all text search needs.
Standard index type for traditional full-text retrieval over documents and Web pages. The context index type provides a rich set of text search capabilities for finding the content you need, without returning pages of spurious results.
Catalog index type, designed specifically for e-Business catalogs. This catalog index provides flexible searching and sorting at Web-speed.
Classification index type for building classification or routing applications. This index is created on a table of queries, where the queries define the classification or routing criteria.
Oracle Text also provides substring and prefix indexes. Substring indexing improves performance for left-truncated or double-truncated wildcard queries. Prefix indexing improves performance for right truncated wildcard queries.
Oracle Text provides a number of utilities to view text, no matter how that text is stored.
Oracle Text supports over 150 document formats through its Inso filtering technology, including all common document formats like XML, PDF, and MS Office. You can also create your own custom filter.
You can view the HTML version of any text, including formatted documents such as PDF, MS Office, and so on.
You can view the HTML version of any text, with search terms highlighted and with navigation to next/previous term in the text.
Oracle Text provides markup information; for example, the offset and length of each search term in the text, to be used for example by a third party viewer.
CTX_QUERY PL/SQL package can be used to generate query feedback, count hits, and create stored query expressions.
See Also:Oracle Text Reference for detailed information about this package
With Oracle Text, you can find, classify, and cluster documents based on their text, metadata, or attributes.
Document classification performs an action based on document content. Actions can be assigned category IDs to a document for future lookup or for sending a document to a user. The result is a set, or stream, of categorized documents. For example, assume that there is an incoming stream of news articles. You can define a rule to represent the category of Finance. The rule is essentially one or more queries that select documents about the subject of finance. The rule might have the form 'stocks or bonds or earnings.' When a document arrives that satisfies the rules for this category, the application takes an action, such as tagging the document as Finance or emailing one or more users.
Clustering is the unsupervised division of patterns into groups. The interface lets users select the appropriate clustering algorithm. Each cluster contains a subset of documents of the collection. A document within a cluster is believed to be more similar with documents inside the cluster than with outside documents. Clusters can be used to build features like presenting similar documents in the collection.
Oracle Ultra Search is built on the Oracle database server and Oracle Text technology that provides uniform search-and-locate capabilities over multiple repositories: Oracle databases, other ODBC compliant databases, IMAP mail servers, HTML documents served up by a Web server, files on disk, and more.
Ultra Search uses a ‘crawler' to index documents; the documents stay in their own repositories, and the crawled information is used to build an index that stays within your firewall in a designated Oracle database. Ultra Search also provides APIs for building content management solutions.
Ultra Search offers the following:
A complete text query language for text search inside the database
Full integration with the Oracle database server and the SQL query language
Advanced features like concept searching and theme analysis
Indexing of all common file formats (150+)
Full globalization, including support for Chinese, Japanese and Korean (CJK), and Unicode
See Also:Oracle Ultra Search User's Guide
Oracle interMedia provides an array of services to develop and deploy traditional, Web, and wireless applications that include rich media. Multimedia content can be stored and managed directly in Oracle, or Oracle can store and index metadata together with external references that enable efficient access to media content stored outside the database.
Oracle interMedia services includes the following:
Parse, index, and store rich content using new or existing database schemas
Develop content rich Web applications
Deploy rich content on the Web
Use standard Oracle Database features to create scalable, manageable media content repositories
Oracle interMedia provides a number of load mechanisms ranging from low volume graphical user interface load utilities, through programmatic load APIs, to bulk media loaders. At load time, interMedia can extract the rich metadata that accompanies the media and use Oracle Text's text indexing and retrieval capabilities to build indexes for query and retrieval of the rich media content based upon the metadata.
Oracle interMedia allows for access to image, audio, and video data in most common Internet formats from a variety of sources, both within Oracle Database and from external locations, such as Web URL sites or specialized servers.
interMedia supports delivery of video through streaming servers such as the RealNetworks RealAudio and RealVideo Servers. interMedia supports drag and drop of audio, video, and image data through the interMedia clipboard into Web applications such as Oracle Application Server Portal and popular Web authoring tools. interMedia also supports efficient development of media rich Java based Internet applications through Oracle JDeveloper and dynamic Web page composition through MacroMedia's Ultradev.
See Also:Oracle interMedia User's Guide
Oracle Spatial makes spatial data management easier and more natural to users of location-enabled applications and geographic information system (GIS) applications. When this data is stored in an Oracle database, it can be easily manipulated, retrieved, and related to all the other data stored in the database.
A common example of spatial data can be seen in a road map. A road map is a two-dimensional object that contains points, lines, and polygons that can represent cities, roads, and political boundaries such as states or provinces. A road map is a visualization of geographic information. The location of cities, roads, and political boundaries that exist on the surface of the Earth are projected onto a two-dimensional display or piece of paper, preserving the relative positions and relative distances of the rendered objects.
The data that indicates the Earth location (latitude and longitude, or height and depth) of these rendered objects is the spatial data. When the map is rendered, this spatial data is used to project the locations of the objects on a two-dimensional piece of paper. A GIS is often used to store, retrieve, and render this Earth-relative spatial data.
Types of spatial data that can be stored using Spatial other than GIS data include data from computer-aided design (CAD) and computer-aided manufacturing (CAM) systems. Instead of operating on objects on a geographic scale, CAD/CAM systems work on a smaller scale, such as for an automobile engine or printed circuit boards.
The differences among these systems are only in the relative sizes of the data, not the data's complexity. The systems might all actually involve the same number of data points. On a geographic scale, the location of a bridge can vary by a few tenths of an inch without causing any noticeable problems to the road builders, whereas if the diameter of an engine's pistons are off by a few tenths of an inch, the engine will not run. A printed circuit board is likely to have many thousands of objects etched on its surface that are no bigger than the smallest detail shown on a road builder's blueprints.
These applications all store, retrieve, update, or query some collection of features that have both nonspatial and spatial attributes. Examples of nonspatial attributes are name, soil_type, landuse_classification, and part_number. The spatial attribute is a coordinate geometry, or vector-based representation of the shape of the feature.
Oracle Spatial provides a SQL schema and functions that facilitate the storage, retrieval, update, and query of collections of spatial features in an Oracle Database. Spatial consists of the following components:
A schema (MDSYS) that prescribes the storage, syntax, and semantics of supported geometric datatypes
A spatial indexing mechanism
A set of operators and functions for performing area-of-interest queries, spatial join queries, and other spatial analysis operations