Database System Implementation
CS346

at 200-305

A major database system implementation project realizes the principles and techniques covered in earlier courses. Students independently build a complete database management system, from file structures through query processing, with a personally designed feature or extension. Lectures on project details and advanced techniques in database system implementation, focusing on query processing and optimization. Guest speakers from industry on commercial DBMS implementation techniques. Prerequisites: CS145, CS245, programming experience in C++. (more info)

Schedule & Handouts

Links to more handouts may be added over time. Lectures may change, but the project due dates are set.
I use the slides provided to prepare my blackboard lectures. There is no 1-1 mapping from slides to the crazy things I say in class.

WeekDateEventHandouts
1 Class Introduction to course, DBMS review, RedBase overview
RedBase Part 0: PF
RedBase Part 1: RM
Old Lecture Notes: Overview
Class File & buffer review, RedBase PF and RM components Slides: Buffer Management (pdf)
Old Lecture Notes: Buffer
2 Class Buffer Management
Slides: Buffer Manager Extra (pdf)
Class Page Layout and File of Records
Project RedBase Part 1: RM Due
3 Class RedBase IX component, Indexing and B+ tree review (by TA)
Slides: B+/B-Link Trees (pdf and paper)
RedBase Part 2: IX
Old Lecture Notes: Indexing
Class Concurrency in Indexing, B-Link tree
4 Class RedBase SM and QL components, Metadata and Query Processing review (by TA)
RedBase Part 3: SM
RedBase Part 4: QL
Old Lecture Notes: Metadata, QL
Class Query Processing lecture
Slides: Cost Models
Old QP Page
Project RedBase Part 2: IX Due
5 Class Recovery (ARIES)
Slides: ARIES (pdf)
ARIES paper
ARIES examples
Class Guest lecture: Eric Sedlar, Oracle
6 Class Guest lecture: Michalis Petropouls, Pivotal
Class Database Analytics (DeepDive, Hogwild!), RedBase EX component
RedBase Part 5: EX
Project RedBase Part 3: SM Due
7 Class Guest lecture: Michael Armburst, Databricks
Class Guest lecture: Mike Cafarella, U of Michigan, Co-founder of Hadoop
Project RedBase Part 5: EX Proposal Due
8 Class Guest lecture: Christian Tinnefeld, SAP
Class Guest lecture: Karthik Ramasamy, Twitter
Project RedBase Part 4: QL Due
9 Class No Class, Memorial Day
Class Guest lecture: TJ Green, Logicblox
10 Class No class, work on your projects!
Class No class, work on your projects!
Project RedBase Part 5: EX Final Demo

Course Info

Course Staff

Chris Re Instructor

Jaeho Shin Teaching Assistant

Marianne Siroker Administrator

Communication

Course Contents

There will be five aspects to the course:

  1. The basic RedBase project, implemented by each student individually.

  2. An extension to RedBase, individually conceived, designed, and implemented by each student.

  3. Lectures on aspects of the RedBase project.

  4. Lectures on advanced database system implementation techniques, with an emphasis on query processing and optimization.

  5. Guest lecturers from industry describing commercial database system implementation techniques, with an emphasis on query processing and optimization.

An overview and details of the project can be found on the RedBase Project page.

Prerequisites

CS145 (Introduction to Databases) and CS245 (Database System Principles) or equivalent knowledge is essential. We will assume that all students already understand basic database system implementation techniques. In this course you will put your basic knowledge into practice while learning about more advanced implementation techniques including those used in commercial products.

We recommend that all students have prior experience with Unix, and at least with the C programming language. It is preferred that students have C++ experience as well, although it is not essential. Students with no C++ experience will need to learn quickly; students with no C/Unix experience probably should not take this course.

Units

Students may enroll in CS346 for 3, 4, or 5 units. All students are expected to do the same amount of work regardless of their number of units. CS346 is a 5-unit course in terms of work; it is offered for fewer units as a courtesy to students who have a limit.

Readings and Textbook

A few research papers will be made available on the web as suggested reading. There is no required textbook for the course, but students may wish to own a comprehensive database textbook for reference, for example:

Other textbooks such as those by Silberschatz, Korth, & Sudarshan; Ramakrishnan & Gehrke; Elmasri & Navathe; O'Neil; or Date also are sufficient.

Grading

90% of your final grade will be based on the project and 10% on class participation. The complete breakdown is:

Project Part 1 15%
Project Part 2 15%
Project Part 3 15%
Project Part 4 20%
Project Part 5 proposal 5%
Project Part 5 demo 20%
Class participation 10%

Your programs will be graded on correctness and efficiency, as well as on descriptions of key design decisions. Details on program grading criteria and mechanisms are provided in the RedBase Logistics document.

Please note that attendance to all guest lectures are required.

CS346 is not graded on a curve. It's a difficult class, and everyone who performs well (defined very roughly as ~90% of project points, good class participation, and a solid RedBase extension) will get an A.

Past Offerings

Here are websites of some of the past offerings of the course.


Students with Documented Disabilities

Students who may need an academic accommodation based on the impact of a disability must initiate the request through the Office of Accessible Education (OAE). OAE staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact OAE as soon as possible since timely notice is needed to coordinate accommodations:

563 Salvatierra Walk
TTY: (650) 723-1067
Voice: (650) 723-1066

Honor Code

Under the Honor Code at Stanford, each of you is expected to submit your own work in this course: all code submitted must have been written by you. However, on many occasions when working on programs it is useful to talk with others (the instructor, the TA, or other students) about design decisions and programming strategies. Such activity is both acceptable and encouraged, but when you turn in your programs you must indicate any assistance you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code.

The project extension proposal must represent individual ideas and writing, and we discourage excessive collaboration in developing proposals. Quiz answers must be original.

The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels. The course staff may employ plagiarism-detection software to ensure that programs turned in are the original work of each student.