|Oracle® Database Data Warehousing Guide
10g Release 1 (10.1)
Part Number B10736-01
This chapter describes how to use the following types of indexes in a data warehousing environment:
See Also:Oracle Database Concepts for general information regarding indexing
Note:Bitmap indexes are available only if you have purchased the Oracle Database Enterprise Edition.
Bitmap indexes are widely used in data warehousing environments. The environments typically have large amounts of data and ad hoc queries, but a low level of concurrent DML transactions. For such applications, bitmap indexing provides:
Reduced response time for large classes of ad hoc queries.
Reduced storage requirements compared to other indexing techniques.
Dramatic performance gains even on hardware with a relatively small number of CPUs or a small amount of memory.
Efficient maintenance during parallel DML and loads.
Fully indexing a large table with a traditional B-tree index can be prohibitively expensive in terms of space because the indexes can be several times larger than the data in the table. Bitmap indexes are typically only a fraction of the size of the indexed data in the table.
An index provides pointers to the rows in a table that contain a given key value. A regular index stores a list of rowids for each key corresponding to the rows with that key value. In a bitmap index, a bitmap for each key value replaces a list of rowids.
Each bit in the bitmap corresponds to a possible rowid, and if the bit is set, it means that the row with the corresponding rowid contains the key value. A mapping function converts the bit position to an actual rowid, so that the bitmap index provides the same functionality as a regular index. Bitmap indexes store the bitmaps in a compressed way. If the number of distinct key values is small, bitmap indexes compress better and the space saving benefit compared to a B-tree index becomes even better.
Bitmap indexes are most effective for queries that contain multiple conditions in the
WHERE clause. Rows that satisfy some, but not all, conditions are filtered out before the table itself is accessed. This improves response time, often dramatically. If you are unsure of which indexes to create, the SQLAccess Advisor can generate recommendations on what to create.
Bitmap indexes are primarily intended for data warehousing applications where users query the data rather than update it. They are not suitable for OLTP applications with large numbers of concurrent transactions modifying the data.
Bitmap indexes are required to take advantage of Oracle's star transformation capabilities.
See Also:Chapter 19, " Schema Modeling Techniques" for further information about using bitmap indexes in data warehousing environments
The advantages of using bitmap indexes are greatest for columns in which the ratio of the number of distinct values to the number of rows in the table is small. We refer to this ratio as the degree of cardinality. A gender column, which has only two distinct values (male and female), is optimal for a bitmap index. However, data warehouse administrators also build bitmap indexes on columns with higher cardinalities.
For example, on a table with one million rows, a column with 10,000 distinct values is a candidate for a bitmap index. A bitmap index on this column can outperform a B-tree index, particularly when this column is often queried in conjunction with other indexed columns. In fact, in a typical data warehouse environments, a bitmap index can be considered for any non-unique column.
B-tree indexes are most effective for high-cardinality data: that is, for data with many possible values, such as
phone_number. In a data warehouse, B-tree indexes should be used only for unique columns or other columns with very high cardinalities (that is, columns that are almost unique). The majority of indexes in a data warehouse should be bitmap indexes.
In ad hoc queries and similar situations, bitmap indexes can dramatically improve query performance.
OR conditions in the
WHERE clause of a query can be resolved quickly by performing the corresponding Boolean operations directly on the bitmaps before converting the resulting bitmap to rowids. If the resulting number of rows is small, the query can be answered quickly without resorting to a full table scan.
Example 6-1 Bitmap Index
The following shows a portion of a company's
SELECT cust_id, cust_gender, cust_marital_status, cust_income_level FROM customers; CUST_ID C CUST_MARITAL_STATUS CUST_INCOME_LEVEL ---------- - -------------------- --------------------- ... 70 F D: 70,000 - 89,999 80 F married H: 150,000 - 169,999 90 M single H: 150,000 - 169,999 100 F I: 170,000 - 189,999 110 F married C: 50,000 - 69,999 120 M single F: 110,000 - 129,999 130 M J: 190,000 - 249,999 140 M married G: 130,000 - 149,999 ...
cust_marital_status, and cust_
income_level are all low-cardinality columns (there are only three possible values for marital status and region, two possible values for gender, and 12 for income level), bitmap indexes are ideal for these columns. Do not create a bitmap index on
cust_id because this is a unique column. Instead, a unique B-tree index on this column provides the most efficient representation and retrieval.
Table 6-1 illustrates the bitmap index for the
cust_gender column in this example. It consists of two separate bitmaps, one for gender.
Table 6-1 Sample Bitmap Index
Each entry (or bit) in the bitmap corresponds to a single row of the
customers table. The value of each bit depends upon the values of the corresponding row in the table. For example, the bitmap
cust_gender='F' contains a one as its first bit because the gender is
F in the first row of the
customers table. The bitmap
cust_gender='F' has a zero for its third bit because the gender of the third row is not
An analyst investigating demographic trends of the company's customers might ask, "How many of our married customers have an income level of G or H?" This corresponds to the following SQL query:
SELECT COUNT(*) FROM customers WHERE cust_marital_status = 'married' AND cust_income_level IN ('H: 150,000 - 169,999', 'G: 130,000 - 149,999');
Bitmap indexes can efficiently process this query by merely counting the number of ones in the bitmap illustrated in Figure 6-1. The result set will be found by using bitmap or merge operations without the necessity of a conversion to rowids. To identify additional specific customer attributes that satisfy the criteria, use the resulting bitmap to access the table after a bitmap to rowid conversion.
Figure 6-1 Executing a Query Using Bitmap Indexes
Unlike most other types of indexes, bitmap indexes include rows that have
NULL values. Indexing of nulls can be useful for some types of SQL statements, such as queries with the aggregate function
Example 6-2 Bitmap Index
SELECT COUNT(*) FROM customers WHERE cust_marital_status IS NULL;
This query uses a bitmap index on
cust_marital_status. Note that this query would not be able to use a B-tree index, because B-tree indexes do not store the
SELECT COUNT(*) FROM customers;
Any bitmap index can be used for this query because all table rows are indexed, including those that have
NULL data. If nulls were not indexed, the optimizer would be able to use indexes only on columns with
NOT NULL constraints.
You can create bitmap indexes on partitioned tables but they must be local to the partitioned table—they cannot be global indexes. A partitioned table can only have global B-tree indexes, partitioned or non-partitioned. See "Index Partitioning" for further information.
In addition to a bitmap index on a single table, you can create a bitmap join index, which is a bitmap index for the join of two or more tables. In a bitmap join index, the bitmap for the table to be indexed is built for values coming from the joined tables. In a data warehousing environment, the join condition is an equi-inner join between the primary key column or columns of the dimension tables and the foreign key column or columns in the fact table.
A bitmap join index can improve the performance by an order of magnitude. By storing the result of a join, the join can be avoided completely for SQL statements using a bitmap join index. Furthermore, since it is most likely to have a much smaller number of distinct values for a bitmap join index compared to a regular bitmap index on the join column, the bitmaps compress better, yielding to less space consumption than a regular bitmap join index on the join column.
Bitmap join indexes are much more efficient in storage than materialized join views, an alternative for materializing joins in advance. This is because the materialized join views do not compress the rowids of the fact tables.
The most common usage of a bitmap join index is in star model environments, where a large table is indexed on columns joined by one or several smaller tables. We will refer to the large table as the fact table and to the smaller tables as dimension tables. The following section describes the four different join models supported by bitmap join indexes. See Chapter 19, " Schema Modeling Techniques" for schema modeling techniques.
Example 6-3 Bitmap Join Index: One Dimension Table Columns Joins One Fact Table
Unlike the example in "Bitmap Index", where a bitmap index on the
cust_gender column on the
customers table was built, we now create a bitmap join index on the fact table
sales for the joined column
cust_id values only:
SELECT time_id, cust_id, amount_sold FROM sales; TIME_ID CUST_ID AMOUNT_SOLD --------- ---------- ----------- 01-JAN-98 29700 2291 01-JAN-98 3380 114 01-JAN-98 67830 553 01-JAN-98 179330 0 01-JAN-98 127520 195 01-JAN-98 33030 280 ...
To create such a bitmap join index, column
customers(cust_gender) has to be joined with table
sales. The join condition is specified as part of the
CREATE statement for the bitmap join index as follows:
CREATE BITMAP INDEX sales_cust_gender_bjix ON sales(customers.cust_gender) FROM sales, customers WHERE sales.cust_id = customers.cust_id LOCAL;
The following query shows illustrates the join result that is used to create the bitmaps that are stored in the bitmap join index:
SELECT sales.time_id, customers.cust_gender, sales.amount_sold FROM sales, customers WHERE sales.cust_id = customers.cust_id; TIME_ID C AMOUNT_SOLD --------- - ----------- 01-JAN-98 M 2291 01-JAN-98 F 114 01-JAN-98 M 553 01-JAN-98 M 0 01-JAN-98 M 195 01-JAN-98 M 280 01-JAN-98 M 32 ...
Table 6-2 illustrates the bitmap representation for the bitmap join index in this example.
Table 6-2 Sample Bitmap Join Index
|sales record 1||1||0|
|sales record 2||0||1|
|sales record 3||1||0|
|sales record 4||1||0|
|sales record 5||1||0|
|sales record 6||1||0|
|sales record 7||1||0|
You can create other bitmap join indexes using more than one column or more than one table, as shown in these examples.
Example 6-4 Bitmap Join Index: Multiple Dimension Columns Join One Fact Table
You can create a bitmap join index on more than one column from a single dimension table, as in the following example, which uses
customers(cust_gender, cust_marital_status) from the
CREATE BITMAP INDEX sales_cust_gender_ms_bjix ON sales(customers.cust_gender, customers.cust_marital_status) FROM sales, customers WHERE sales.cust_id = customers.cust_id LOCAL NOLOGGING;
Example 6-5 Bitmap Join Index: Multiple Dimension Tables Join One Fact Table
You can create a bitmap join index on multiple dimension tables, as in the following, which uses
CREATE BITMAP INDEX sales_c_gender_p_cat_bjix ON sales(customers.cust_gender, products.prod_category) FROM sales, customers, products WHERE sales.cust_id = customers.cust_id AND sales.prod_id = products.prod_id LOCAL NOLOGGING;
Example 6-6 Bitmap Join Index: Snowflake Schema
You can create a bitmap join index on more than one table, in which the indexed column is joined to the indexed table by using another table. For example, we can build an index on
countries.country_name, even though the
countries table is not joined directly to the
sales table. Instead, the
countries table is joined to the
customers table, which is joined to the sales table. This type of schema is commonly called a snowflake schema.
CREATE BITMAP INDEX sales_co_country_name_bjix ON sales(countries.country_name) FROM sales, customers, countries WHERE sales.cust_id = customers.cust_id AND customers.country_id = countries.country_id LOCAL NOLOGGING COMPUTE STATISTICS;
Join results must be stored, therefore, bitmap join indexes have the following restrictions:
Parallel DML is currently only supported on the fact table. Parallel DML on one of the participating dimension tables will mark the index as unusable.
Only one table can be updated concurrently by different transactions when using the bitmap join index.
No table can appear twice in the join.
You cannot create a bitmap join index on an index-organized table or a temporary table.
The columns in the index must all be columns of the dimension tables.
The dimension table join columns must be either primary key columns or have unique constraints.
The dimension table column(s) participating the join with the fact table must be either the primary key column(s) or with the unique constraint.
If a dimension table has composite primary key, each column in the primary key must be part of the join.
The current restrictions for creating a regular bitmap index also apply to a bitmap join index. For example, you cannot create a bitmap index with the
UNIQUE attribute. See Oracle Database SQL Reference for other restrictions.
A B-tree index is organized like an upside-down tree. The bottom level of the index holds the actual data values and pointers to the corresponding rows, much as the index in a book has a page number associated with each index entry.
In general, use B-tree indexes when you know that your typical query refers to the indexed column and retrieves a few rows. In these queries, it is faster to find the rows by looking at the index. However, using the book index analogy, if you plan to look at every single topic in a book, you might not want to look in the index for the topic and then look up the page. It might be faster to read through every chapter in the book. Similarly, if you are retrieving most of the rows in a table, it might not make sense to look up the index to find the table rows. Instead, you might want to read or scan the table.
B-tree indexes are most commonly used in a data warehouse to index unique or near-unique keys. In many cases, it may not be necessary to index these columns in a data warehouse, because unique constraints can be maintained without an index, and because typical data warehouse queries may not work better with such indexes. B-tree indexes are more common in environments using third normal form schemas. In general, bitmap indexes should be more common than B-tree indexes in most data warehouse environments.
Bitmap indexes are always stored in a patented, compressed manner without the need of any user intervention. B-tree indexes, however, can be stored specifically in a compressed manner to enable huge space savings, storing more keys in each index block, which also leads to less I/O and better performance.
Key compression lets you compress a b-tree index, which reduces the storage overhead of repeated values. In the case of a nonunique index, all index columns can be stored in a compressed format, whereas in the case of a unique index, at least one index column has to be stored uncompressed.
Generally, keys in an index have two pieces, a grouping piece and a unique piece. If the key is not defined to have a unique piece, Oracle provides one in the form of a rowid appended to the grouping piece. Key compression is a method of breaking off the grouping piece and storing it so it can be shared by multiple unique pieces. The cardinality of the chosen columns to be compressed determines the compression ratio that can be achieved. So, for example, if a unique index that consists of five columns provides the uniqueness mostly by the last two columns, it is most optimal to choose the three leading columns to be stored compressed. If you choose to compress four columns, the repetitiveness will be almost gone, and the compression ratio will be worse.
Although key compression reduces the storage requirements of an index, it can increase the CPU time required to reconstruct the key column values during an index scan. It also incurs some additional storage overhead, because every prefix entry has an overhead of four bytes associated with it.
B-tree indexes on partitioned tables can be global or local. With Oracle8i and earlier releases, Oracle recommended that global indexes not be used in data warehouse environments because a partition DDL statement (for example,
PARTITION) would invalidate the entire index, and rebuilding the index is expensive. In Oracle Database 10g, global indexes can be maintained without Oracle marking them as unusable after DDL. This enhancement makes global indexes more effective for data warehouse environments.
However, local indexes will be more common than global indexes. Global indexes should be used when there is a specific requirement which cannot be met by local indexes (for example, a unique index on a non-partitioning key, or a performance requirement).
Bitmap indexes on partitioned tables are always local.
See Also:"Types of Partitioning" for further details