Stanford EE Computer Systems Colloquium

4:15PM, Wednesday, Nov 11, 2009
HP Auditorium, Gates Computer Science Building B01
http://ee380.stanford.edu

Rethinking Time in Distributed Systems
How can we build complex systems simply?

Paul Borrill
REPLICUS Software Corporation
About the talk:

The $200B Information Storage Industry is mired in a complexity crisis, with no apparent end in sight. The origin of which may be viewed as rational incrementalism by an industry incented for near-term investment in palliatives to address symptoms rather than true innovation to effect a cure. Complexity is defined in this context as the amount of "work" to bring a system up-to its intended state of operation both during initial installation, and after a perturbation (failure, disaster or attack). It can be quantified as the integral over the product of attention (hours) and the administrative skill required to achieve specified service levels. This leads directly to its impact on the industry as the increasingly dominant fraction of a CIO's budget spent on administrative overhead to manage the burden of complexity of their organization's network and data storage infrastructures.

However, the issue with complexity is far more insidious than operational cost: complexity scales disproportionately with system size, connectivity and diversity; and its most harmful effects impact the core resilience of our government, military & enterprise infrastructures. Because, in the aftermath of failures, disasters or attacks, how fast an infrastructure can recover, is a direct measure of the harm that will be spared to the organization, and its mission, as complexity is the overwhelming impediment to rapid recovery.

Investigation into the nature of this complexity takes us down a path which uncovers deep technical challenges that must be overcome. One of which involves the way computer scientists think about and conceptualize "time". This is the core motivation for this project.

A relationship with time is intrinsic to everything we do in creating, modifying and moving our data. Yet the conceptualization of time and causality in the Computer Science literature appears far behind that of physics and philosophy. This state of affairs is of concern, because if fundamental flaws exist in axioms underlying the algorithms that govern access to and evolution of our data, then we would experience inexplicable failures and other undesirable behaviors that grow worse as our systems scale. The purpose of this talk is to compare results from physics and other disciplines, and to investigate if and where hazards to the integrity of our information may exist due to current conceptions of time in Computer Science; especially for distributed systems, where scale, transmission rates and spatial distribution would most readily manifest anomalous behavior arising from such flaws.

Slides:

Download the slides for this presentation in PDF format.

About the speaker:

Paul Borrill is the President and Founder of REPLICUS Software, the former VP/CTO for VERITAS Software, VP/Chief Architect of Storage Systems at Quantum, and Chief Scientist for IR at Sun Microsystems. Paul was also the founding Chairman of the Storage Networking Industry Association (SNIA)

Contact information:

Paul Borrill
4287 Miranda Avenue
Palo Alto
CA 94306
(650) 917-9084

paul@replicus.com