Stanford Research Communication Program
  Home   Researchers Professionals  About
 
Archive by Major Area

Engineering
Humanities

Social Science

Natural Science

Archive by Year

Fall 1999 - Spring 2000

Fall 2000 - Summer 2001

Fall 2001 - Spring 2002

Fall 2002 - Summer 2003


 

 

 


How to Make Databases Answer Your Queries Faster

Rada Chirkova
Department of Computer Science
Stanford University
March 2002

A typical database stores data in some structured format (i.e. tables) and a typical use of a database is to ask it questions, or queries. In many cases it is important to get fast answers to queries. When the structure of queries doesn't match well the structure of the database, however, it may take the database too long to produce the answers. The aim of my research is to make databases answer important queries faster, by automatically making the structure of the database resemble the structure of the queries.

My research is in computer science, or, more precisely, in databases. Commercial and scientific databases store large amounts of information, and users ask queries based on the stored information. It may take a database system a long time to answer a query on a large database, while at the same time users would like to get fast answers to important queries. I study how to "speed up" a database when it's answering an important query.

Suppose a database has a stored "Parent" table, with information on parents and children. Suppose we have to repeatedly ask a query about cousins. Because two people are cousins if their parents share a parent, we can compute the answer to the "Cousins" query using the Parent table. At the same time, this way of defining Cousins is quite awkward; we say that the structure of the database does not match the structure of the query. Because defining Cousins in terms of Parent is complicated, it takes the database a long time to compute the answer to the Cousins query. Is there a way to make the database answer the query any faster?

Databases use a well-known technique called "materializing views." While the "real" contents of a database are stored tables, views are virtual tables: a view is defined just like a query, but is not stored. Materializing a view means precomputing the answers to the view and storing them in the database. In our example, we can use the stored Parent table to define a "Grandparent" view that has information on grandparents and grandchildren.

Once we have a view definition, we can use it to redefine important queries. For example, we can redefine the Cousins query using only the Grandparent view: two people are cousins if they share a grandparent. This definition of Cousins in terms of Grandparent is shorter and neater than the definition in terms of Parent. We say that the structure of the Grandparent table matches the structure of the Cousins query.

The definition of Cousins in terms of Grandparent would be fast to compute, if only we had the table for Grandparent stored in our database. We can make it happen by materializing-pre-computing and storing-the answer to the view. In general, when a query is not easy to compute, materializing certain views has the potential of simplifying the query-answering process. What we are doing is restructuring our database to make it resemble our queries.

When we have a query and some view definitions, we can try to redefine (or rewrite) the query using each view, and then materialize the view that gives the fastest rewriting of the query. But how do we know that the view we have chosen is the best for our query? It may happen that there exists a view, whose definition we do not yet know, which would make our query faster than any view whose definition we do know. In other words, we sometimes need to invent new views to make the computation of a query as fast as possible.

This is the problem I am looking at. Inventing and materializing views means restructuring your database, because it's about making your database closer to your querying needs. I look at how to invent all views that might be useful for a particular query, and at how to choose the best among those views. I use mathematical methods for inventing view definitions based on the definition of the query. The problem is very complex: it turns out that an infinite number of views may be useful for rewriting queries, but at the same time, the number of the "best" views is always finite. I am working on an algorithm that searches among these "best" views to find the view that maximally speeds up your query. Materializing that view guarantees helping your database answer important queries as fast as possible.