Current Projects

In my research projects we don't just build prototype systems and write papers; we build production-quality software systems that are used by other people for their daily work. In most cases we release the software in open-source form. This approach is unusual in academia, but it allows us to do much better research: designing for production use forces us to think about important issues that could be ignored otherwise, and measurements of usage allow us to evaluate our ideas more thoroughly. Furthermore, this approach allows students to develop a higher level of system-building maturity than would be possible otherwise.

RAMCloud (2009- )

The RAMCloud project is creating a new class of storage, based entirely in DRAM, that is 2-3 orders of magnitude faster than existing storage systems. If successful, it will enable new applications that manipulate large-scale datasets much more intensively than has ever been possible before. In addition, we think RAMCloud, or something like it, will become the primary storage system for cloud computing environments such as Amazon's AWS and Microsoft's Azure.

The role of DRAM in storage systems has been increasing rapidly in recent years, driven by the needs of large-scale Web applications. These applications manipulate very large datasets with an intensity that cannot be satisfied by disks alone. As a result, applications are keeping more and more of their data in DRAM. For example, large-scale caching systems such as memcached are being widely used (in 2009 Facebook used a total of 150 TB of DRAM in memcached and other caches for a database containing 200 TB of disk storage), and the major Web search engines now keep their search indexes entirely in DRAM.

Although DRAM's role is increasing, it still tends to be used in limited or specialized ways. In most cases DRAM is just a cache for some other storage system such as a database; in other cases (such as search indexes) DRAM is managed in an application-specific fashion. It is difficult for developers to use DRAM effectively in their applications; for example, the application must manage consistency between caches and the backing storage. In addition, cache misses and backing store overheads make it difficult to capture DRAM's full performance potential.

Our goal for RAMCloud is to create a general-purpose storage system that makes it easy for developers to harness the full performance potential of large-scale DRAM storage. It keeps all data in DRAM all the time, so there are no cache misses. RAMCloud storage is durable and available, so developers need not manage a separate backing store. RAMCloud is designed to scale to thousands of servers and hundreds of terabytes of data while providing uniform low-latency access to all machines within a large datacenter.

In January 2014 we officially tagged RAMCloud version 1.0, which means the system has reached a point of maturity where it can support real applications (in particular the crash recovery mechanisms now work reliably both for crashes of storage masters and for crashes of the cluster coordinator). On our 80-node test cluster we are able to perform remote reads of 100-byte objects in about 5 microseconds and writes in about 16 microseconds. An individual server can process about 700,000 small read requests per second.

The RAMCloud project is still young, so there are many interesting research issues still to explore, such as the following:

  • Data model. RAMCloud currently supports a very simple data model (key-value store); we would like to see if we can provide higher-level features such as secondary indexes and multi-object transactions while without sacrificing the scalability or performance of the system.
  • Consistency: we believe that RAMCloud can provide strong consistency (serializability) without sacrificing performance, but there are several interesting problems to solve in order to achieve that.
  • Cluster management: what are the right mechanisms and policies for reorganizing RAMCloud data in response to changes in the amount of data and the access patterns?
  • Network protocols: we don't think that TCP is the right protocol to provide highest performance within a datacenter, so there is an interesting research project to investigate what is the ideal protocol.
  • Multi-tenancy: how to support multiple independent, perhaps hostile, applications sharing the same RAMCloud storage system within a large datacenter? This introduces issues related to access control and also potentially issues of performance isolation.
  • Multiple datacenters: our current design for RAMCloud focuses on a single datacenter, but some applications will require redundancy across datacenters in order to protect against datacenter failures. An interesting question is whether we can provide that level of redundancy without dramatically impacting the performance of the system.

Related links:

  • Log-Structured Memory: describes how RAMCloud manages in-memory storage using an approach similar to that of log-structured filesystems. This allows RAMCloud to use memory at 80-90% utilization without major performance degradation. This paper appeared in the USENIX FAST Conference in February, 2014.
  • Fast Recovery in RAMCloud: describes RAMCloud's mechanism for recovering crashed servers in 1-2 seconds. This paper appeared in the ACM Symposium on Operating Systems Principles in October, 2011.
  • The Case for RAMCloud: a position paper that discusses the motivation for RAMCloud, the new kinds of applications it may enable, and some of the research issues that will have to be addressed to create a working system. This paper appeared in Communications of the ACM in July 2011.
  • An earlier and a slightly longer version appeared in Operating Systems Review in December 2009.
  • The RAMCloud Wiki: used by project members to share design documents, miscellaneous notes, and links to related materials.

Previous Projects

The following sections describe some projects on which I have worked in the past. I am no longer involved with these projects, though several of them are still active as open-source projects or commercial products. The years listed for each project represent the period when I was involved.

Fiz (2008-2011)

The Fiz project explored new frameworks for highly interactive Web applications. The goal of the project was to develop higher-level reusable components in order to encourage reusability and simplify application development by hiding inside the components many of the complexities that bedevil Web developers (such as security issues or using Ajax to enhance interactivity).

Related links:

  • Integrating Long Polling with an MVC Web Framework. This paper appeared in the 2011 USENIX Conference on Web Application Development; it shows how the architecture of an MVC Web development framework can be extended to make it easy for Web applications to push updates automatically from servers out of browsers.
  • Managing State for Ajax-Driven Web Components. This paper appears in the 2010 USENIX Conference on Web Application Development; it discusses the state-management issues that arise when trying to use a component-based approach for Web applications using Ajax, and evaluates two alternative solutions to the problem.
  • Fiz: A Component Framework for Web Applications: a technical report that provides an overview of Fiz.

ElectricCommander (2005-2007, Electric Cloud)

ElectricCommander is the second major product for Electric Cloud. It addresses the problem of managing software development processes such as nightly builds and automated test suites. Most organizations have home-grown software for these tasks, which is hard to manage and scales poorly as the organization grows. ElectricCommander provides a general-purpose Web-based platform for managing these processes. Developers use a Web interface to describe each job as a collection of shell commands that run serially or in parallel on one or more machines. The ElectricCommander server manages the execution of these commands according to schedules defined by the developers. It also provides a variety of reporting tools and manages a collection of server machines to allow concurrent execution of multiple jobs.

ElectricAccelerator (2002-2005, Electric Cloud)

ElectricAccelerator is Electric Cloud's first product; it accelerates software builds based on the make program by running steps concurrently on a cluster of server machines. Previous attempts at concurrent builds have had limited success because most Makefiles do not have adequate dependency information. As a result, the build tool cannot safely identify steps that can execute concurrently and attempts to use large-scale parallelism produce flaky or broken builds. ElectricAccelerator solves this problem using kernel-level file system drivers to record file accesses during the build. From this information it can construct a perfect view of dependencies, even if the Makefiles are incorrect. As a result, ElectricAccelerator can safely utilize dozens of machines in a single build, producing speedups of 10-20x.

Tcl/Tk (1988-2000, U.C. Berkeley, Sun, Scriptics)

Tcl is an embeddable scripting language: it is a simple interpreted language implemented as a library package that can be incorporated into a variety of applications. Furthermore, Tcl is extensible, so additional functions (such as those provided by the enclosing application) can easily be added to those already provided by the Tcl interpreter. Tk is a GUI toolkit that is implemented as a Tcl extension. I initially built Tcl and Tk as hobby projects and didn't think that anyone besides me would care about them. However, they were widely adopted because they solved two problems. First, the combination of Tcl and Tk made it much easier to create graphical user interfaces than previous frameworks such as Motif (and, Tcl and Tk were eventually ported from Unix to the Macintosh and Windows, making Tcl/Tk applications highly portable). Second, Tcl made it easy to include powerful command languages in a variety of applications ranging from design tools to embedded systems.

Related links:

Log-Structured File Systems (1988-1994, U.C. Berkeley)

A log-structured file system (LFS) is one where all information is written to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it contains index information so files can be read back from the log efficiently. LFS was initially motivated by the RAID project, because random writes are expensive in RAID disk arrays. However, the LFS approach has also found use in other settings, such as flash memory where wear-leveling is an important problem. Mendel Rosenblum created the first LFS implementation as part of the Sprite project; Ken Shirriff added RAID support to LFS in the Sawmill project, and John Hartman extended the LFS ideas into the world of cluster file systems with Zebra.

Related links:

Sprite (1984-1994, U.C. Berkeley)

Sprite was a Unix-like network operating system built at U.C. Berkeley and used for day-to-day computing by about 100 students and staff for more than five years. The overall goal of the project was to create a single system image, meaning that a collection of workstations would appear to users the same as a single time-shared system, except with much better performance. Among its more notable features were support for diskless workstations, large client-level file caches with guaranteed consistency, and a transparent process migration mechanism. We built a parallel version of make called pmake that took advantage of process migration to run builds across multiple machines, providing 3-5x speedups (pmake was the inspiration for the ElectricAccelerator product at Electric Cloud).

Related links:

VLSI Design Tools (1980-1986, U.C. Berkeley)

When I arrived at Berkeley in 1980 the Mead-Conway VLSI revolution was in full bloom, enabling university researchers to create large-scale integrated circuits such as the Berkeley RISC chips. However, there were few tools for chip designers to use. In this project my students and I created a series of design tools, starting with a simple layout editor called Caesar. We quickly replaced Caesar with a more powerful layout editor called Magic. Magic made two contributions: first, it implemented a number of novel interactive features for developers, such as incremental design-rule checking, which notified developers of design rule violations immediately while they edited, rather than waiting for a slow offline batch run. Second, Magic incorporated algorithms for operating on hierarchical layout structures, which were much more efficient than previous approaches that "flattened" the layout into a non-hierarchical form. Magic took advantage of a new data structure called corner stitching, which permitted efficient implementation of a variety of geometric operations. We also created a switch-level timing analysis program called Crystal. We made free source-level releases of these tools and many others in what became known as the Berkeley VLSI Tools Distributions; these represented one of the earliest open-source software releases. Magic continued to evolve after the end of the Berkeley project; as of 2005 it was still widely used.

Cm* and Medusa (1975-1980, Carnegie Mellon University)

As a graduate student at Carnegie Mellon University I worked on the Cm* project, which created the first large-scale NUMA multiprocessor. Cm* consisted of 50 LSI-ll processors connected by a network of special-purpose processors called Kmaps that provided shared virtual memory. I worked initially on the design and implementation of the Kmap hardware, then shifted to the software side and led the Medusa project, which was one of two operating systems created for Cm*. The goal of the Medusa project was to understand how to structure an operating system to run on the NUMA hardware; among other things, Medusa was the first system to implement coscheduling (now called "gang scheduling").