Stanford EE Computer Systems Colloquium

4:15PM, Wednesday, March 2, 2011
HP Auditorium, Gates Computer Science Building B01
http://ee380.stanford.edu

Python in Python: the PyPy system

Armin Rigo
Heinrich-Heine Universität Düsseldorf
About the talk:

PyPy is a complete Python implementation in Python, in the old tradition of Squeak and Scheme48 --- but there is more to PyPy than just this.

During this talk I will describe what PyPy is: a mature, 8 year old project of roughly 200K lines of code and 150K lines of tests, implementing the full Python language. I will show our results: faster execution of most programs (by a factor between 1.5x and 20x), and smaller total memory usage for large programs.

I will then focus on the architecture of PyPy. On the one hand, we have written a straightforward interpreter for the Python language, using a (large) subset of Python called RPython. On the other hand, we have a complex translation toolchain which is able to compile interpreters from RPython to efficient C code. (We also have experimental backends for producing JVM and .NET code.)

There are two distinct benefits from keeping the interpreter and the translation toolchain separate. On the one hand, we keep our interpreter simple, and we can easily write interpreters for other languages. We have a complete Prolog interpreter and have at least played with versions for Smalltalk and JavaScript. On the other hand, the fact that our source interpreter does not contain any architectural choices makes for unprecedented flexibility. Our toolchain "weaves" into the final executable various aspects including the object memory layout, choice of garbage collection (GC), of execution model (regular vs. "Stackless"), choice of backend (C/JVM/.NET), and even the Just-in-Time Compiler (JIT). There are great practical benefits to this. For example, CPython's GC is stuck with using reference counting, while we offer a number of choices.

I will explain how this is done, and describe in more details the JIT Compiler. It is a "tracing JIT", as pioneered in recent years for Java and JavaScript. However, it is also a "meta JIT": it works for any language, by tracing at the level of the language's interpreter. In other words, from an interpreter for any language, we produce quasi-automatically a JIT suited to this language.

I will conclude by comparing PyPy to other projects, old and new: Squeak, CPython, Jython and IronPython, the Jikes RVM, as well as the various recent tracing JITs such as TraceMonkey.

Download broadside about a PyPy: High-Level Languages Need Not Be Slow!.

Slides:

View a HTML version of the slides for this presentation. Navigate using the Pg-Up/Pg-Down keys.

About the speaker:

Armin Rigo is a researcher at Heinrich-Heine-Universitüt in Düsseldorf, Germany. His academic interests include Programming Languages and Implementation Techniques.

He is the lead designer of the PyPy project and one of its original founders. He is also the author of Psyco, a hand-coded Just-in-Time specializing compiler for Python, which can be used transparently with 32-bit x86 versions of CPython. Since 2003 he has worked on all aspects of PyPy: its Python interpreter (written in Python), its translation toolchain (which produces C code), its garbage collectors, and its Tracing Just-in-Time compiler generator. Since the end of 2010, the Just in Time compiler generated by PyPy has outperformed Psyco, while being much more general and robust.

Contact information:

Armin Rigo
Institut für Informatik
Universitütsstr. 1
40225 Düsseldorf, GERMANY
+49 211 8110713

arigo@tunes.org