PyPy Status Blog: April 2011

Saturday, April 30, 2011

PyPy 1.5 Released: Catching Up

We're pleased to announce the 1.5 release of PyPy. This release updates PyPy with the features of CPython 2.7.1, including the standard library. Thus all the features of CPython 2.6 and CPython 2.7 are now supported. It also contains additional performance improvements. You can download it here:

http://pypy.org/download.html

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.1. It's fast (pypy 1.5 and cpython 2.6.2 performance comparison) due to its integrated tracing JIT compiler.

This release includes the features of CPython 2.6 and 2.7. It also includes a large number of small improvements to the tracing JIT compiler. It supports Intel machines running Linux 32/64 or Mac OS X. Windows is beta (it roughly works but a lot of small issues have not been fixed so far). Windows 64 is not yet supported.

Numerous speed achievements are described on our blog. Normalized speed charts comparing pypy 1.5 and pypy 1.4 as well as pypy 1.5 and cpython 2.6.2 are available on our benchmark website. The speed improvement over 1.4 seems to be around 25% on average.

More highlights

The largest change in PyPy's tracing JIT is adding support for loop invariant code motion, which was mostly done by Håkan Ardö. This feature improves the performance of tight loops doing numerical calculations.
The CPython extension module API has been improved and now supports many more extensions. For information on which one are supported, please refer to our compatibility wiki.
These changes make it possible to support Tkinter and IDLE.
The cProfile profiler is now working with the JIT. However, it skews the performance in unstudied ways. Therefore it is not yet usable to analyze subtle performance problems (the same is true for CPython of course).
There is an external fork which includes an RPython version of the postgresql. However, there are no prebuilt binaries for this.
Our developer documentation was moved to Sphinx and cleaned up.
and many small things :-)

Cheers,

Carl Friedrich Bolz, Laura Creighton, Antonio Cuni, Maciej Fijalkowski, Amaury Forgeot d'Arc, Alex Gaynor, Armin Rigo and the PyPy team

http://pypy.org/download.html

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.1. It's fast (pypy 1.5 and cpython 2.6.2 performance comparison) due to its integrated tracing JIT compiler.

More highlights

The largest change in PyPy's tracing JIT is adding support for loop invariant code motion, which was mostly done by Håkan Ardö. This feature improves the performance of tight loops doing numerical calculations.
The CPython extension module API has been improved and now supports many more extensions. For information on which one are supported, please refer to our compatibility wiki.
These changes make it possible to support Tkinter and IDLE.
The cProfile profiler is now working with the JIT. However, it skews the performance in unstudied ways. Therefore it is not yet usable to analyze subtle performance problems (the same is true for CPython of course).
There is an external fork which includes an RPython version of the postgresql. However, there are no prebuilt binaries for this.
Our developer documentation was moved to Sphinx and cleaned up.
and many small things :-)

Cheers,

Carl Friedrich Bolz, Laura Creighton, Antonio Cuni, Maciej Fijalkowski, Amaury Forgeot d'Arc, Alex Gaynor, Armin Rigo and the PyPy team

Posted by Carl Friedrich Bolz-Tereick at 16:59 20 Comments

Wednesday, April 20, 2011

Using Tkinter and IDLE with PyPy

We are pleased to announce that Tkinter, the GUI library based on TCL/TK, now works with PyPy.
Tkinter is composed of two parts:

_tkinter, a module written in C which interfaces with the TCL world

Tkinter, a pure Python package which wraps _tkinter to expose the pythonic API we are used to

The PyPy version of _tkinter reuses the C code of as found in CPython and compile it through the PyPy C-API compatibility layer, cpyext. To make it work with PyPy, we had to modify it slightly, in order to remove the dependency on some API functions which are not supported by PyPy. In particular, we removed the dependency on the PyOS_InputHook variable, which allows a nice integration of Tkinter and the Python interactive prompt: the result is that, unlike CPython, in PyPy Tk windows created at the interactive prompt are not shown until we manually call the mainloop method. Apart from this inconvenience, all the rest works fine.
At the moment, _tkinter is not distributed with PyPy because our build system does not support automatic compilation of C extension. Instead, it is necessary to install it manually, either directly from source or by easy_installing/pip installing tkinter-pypy from PyPI.
For everything to work correctly, you need a recent build of PyPy: the following is a step-by-step guide to install _tkinter in a PyPy nightly build for Linux 64 bit; for other architectures, look at the nightly build page:

$ wget http://buildbot.pypy.org/nightly/trunk/pypy-c-jit-43485-1615dfd7d8f1-linux64.tar.bz2

$ tar xfv pypy-c-jit-43485-1615dfd7d8f1-linux64.tar.bz2

$ cd pypy-c-jit-43485-1615dfd7d8f1-linux64/

$ wget http://peak.telecommunity.com/dist/ez_setup.py

$ ./bin/pypy ez_setup.py    # install setuptools

$ ./bin/easy_install tkinter-pypy

Once you complete the steps above, you can start using Tkinter from your python programs. In particular, you can use IDLE, the IDE which is part of the Python standard library. To start IDLE, type:

$ ./bin/pypy -m idlelib.idle

Have fun :-)

We are pleased to announce that Tkinter, the GUI library based on TCL/TK, now works with PyPy.
Tkinter is composed of two parts:

_tkinter, a module written in C which interfaces with the TCL world

Tkinter, a pure Python package which wraps _tkinter to expose the pythonic API we are used to

$ wget http://buildbot.pypy.org/nightly/trunk/pypy-c-jit-43485-1615dfd7d8f1-linux64.tar.bz2

$ tar xfv pypy-c-jit-43485-1615dfd7d8f1-linux64.tar.bz2

$ cd pypy-c-jit-43485-1615dfd7d8f1-linux64/

$ wget http://peak.telecommunity.com/dist/ez_setup.py

$ ./bin/pypy ez_setup.py    # install setuptools

$ ./bin/easy_install tkinter-pypy

Once you complete the steps above, you can start using Tkinter from your python programs. In particular, you can use IDLE, the IDE which is part of the Python standard library. To start IDLE, type:

$ ./bin/pypy -m idlelib.idle

Have fun :-)

Posted by Antonio Cuni at 12:22 13 Comments

Wednesday, April 6, 2011

Tutorial Part 2: Adding a JIT

This is the second part of a tutorial written by Andrew Brown. The first part described how to write an interpreter with PyPy.

Adding JIT

Translating RPython to C is pretty cool, but one of the best features of PyPy is its ability to generate just-in-time compilers for your interpreter. That's right, from just a couple hints on how your interpreter is structured, PyPy will generate and include a JIT compiler that will, at runtime, translate the interpreted code of our BF language to machine code!

So what do we need to tell PyPy to make this happen? First it needs to know where the start of your bytecode evaluation loop is. This lets it keep track of instructions being executed in the target language (BF).

We also need to let it know what defines a particular execution frame. Since our language doesn't really have stack frames, this boils down to what's constant for the execution of a particular instruction, and what's not. These are called "green" and "red" variables, respectively.

Refer back to example2.py for the following.

In our main loop, there are four variables used: pc, program, bracket_map, and tape. Of those, pc, program, and bracket_map are all green variables. They define the execution of a particular instruction. If the JIT routines see the same combination of green variables as before, it knows it's skipped back and must be executing a loop. The variable "tape" is our red variable, it's what's being manipulated by the execution.

So let's tell PyPy this info. Start by importing the JitDriver class and making an instance:

from pypy.rlib.jit import JitDriver
jitdriver = JitDriver(greens=['pc', 'program', 'bracket_map'],
        reds=['tape'])

And we add this line to the very top of the while loop in the mainloop function:

jitdriver.jit_merge_point(pc=pc, tape=tape, program=program,
        bracket_map=bracket_map)

We also need to define a JitPolicy. We're not doing anything fancy, so this is all we need somewhere in the file:

def jitpolicy(driver):
    from pypy.jit.codewriter.policy import JitPolicy
    return JitPolicy()

See this example at example3.py

Now try translating again, but with the flag --opt=jit:

$ python ./pypy/pypy/translator/goal/translate.py --opt=jit example3.py

It will take significantly longer to translate with JIT enabled, almost 8 minutes on my machine, and the resulting binary will be much larger. When it's done, try having it run the mandelbrot program again. A world of difference, from 12 seconds compared to 45 seconds before!

Interestingly enough, you can see when the JIT compiler switches from interpreted to machine code with the mandelbrot example. The first few lines of output come out pretty fast, and then the program gets a boost of speed and gets even faster.

A bit about Tracing JIT Compilers

It's worth it at this point to read up on how tracing JIT compilers work. Here's a brief explanation: The interpreter is usually running your interpreter code as written. When it detects a loop of code in the target language (BF) is executed often, that loop is considered "hot" and marked to be traced. The next time that loop is entered, the interpreter gets put in tracing mode where every executed instruction is logged.

When the loop is finished, tracing stops. The trace of the loop is sent to an optimizer, and then to an assembler which outputs machine code. That machine code is then used for subsequent loop iterations.

This machine code is often optimized for the most common case, and depends on several assumptions about the code. Therefore, the machine code will contain guards, to validate those assumptions. If a guard check fails, the runtime falls back to regular interpreted mode.

A good place to start for more information is http://en.wikipedia.org/wiki/Just-in-time_compilation

Debugging and Trace Logs

Can we do any better? How can we see what the JIT is doing? Let's do two things.

First, let's add a get_printable_location function, which is used during debug trace logging:

def get_location(pc, program, bracket_map):
    return "%s_%s_%s" % (
            program[:pc], program[pc], program[pc+1:]
            )
jitdriver = JitDriver(greens=['pc', 'program', 'bracket_map'], reds=['tape'],
        get_printable_location=get_location)

This function is passed in the green variables, and should return a string. Here, we're printing out the BF code, surrounding the currently executing instruction with underscores so we can see where it is.

Download this as example4.py and translate it the same as example3.py.

Now let's run a test program (test.b, which just prints the letter "A" 15 or so times in a loop) with trace logging:

$ PYPYLOG=jit-log-opt:logfile ./example4-c test.b

Now take a look at the file "logfile". This file is quite hard to read, so here's my best shot at explaining it.

The file contains a log of every trace that was performed, and is essentially a glimpse at what instructions it's compiling to machine code for you. It's useful to see if there are unnecessary instructions or room for optimization.

Each trace starts with a line that looks like this:

[3c091099e7a4a7] {jit-log-opt-loop

and ends with a line like this:

[3c091099eae17d jit-log-opt-loop}

The next line tells you which loop number it is, and how many ops are in it. In my case, the first trace looks like this:

  [3c167c92b9118f] {jit-log-opt-loop
  # Loop 0 : loop with 26 ops
  [p0, p1, i2, i3]
  debug_merge_point('+<[>[_>_+<-]>.[<+>-]<<-]++++++++++.', 0)
  debug_merge_point('+<[>[>_+_<-]>.[<+>-]<<-]++++++++++.', 0)
  i4 = getarrayitem_gc(p1, i2, descr=<SignedArrayDescr>)
  i6 = int_add(i4, 1)
  setarrayitem_gc(p1, i2, i6, descr=<SignedArrayDescr>)
  debug_merge_point('+<[>[>+_<_-]>.[<+>-]<<-]++++++++++.', 0)
  debug_merge_point('+<[>[>+<_-_]>.[<+>-]<<-]++++++++++.', 0)
  i7 = getarrayitem_gc(p1, i3, descr=<SignedArrayDescr>)
  i9 = int_sub(i7, 1)
  setarrayitem_gc(p1, i3, i9, descr=<SignedArrayDescr>)
  debug_merge_point('+<[>[>+<-_]_>.[<+>-]<<-]++++++++++.', 0)
  i10 = int_is_true(i9)
  guard_true(i10, descr=<Guard2>) [p0]
  i14 = call(ConstClass(ll_dict_lookup__dicttablePtr_Signed_Signed), ConstPtr(ptr12), 90, 90, descr=<SignedCallDescr>)
  guard_no_exception(, descr=<Guard3>) [i14, p0]
  i16 = int_and(i14, -9223372036854775808)
  i17 = int_is_true(i16)
  guard_false(i17, descr=<Guard4>) [i14, p0]
  i19 = call(ConstClass(ll_get_value__dicttablePtr_Signed), ConstPtr(ptr12), i14, descr=<SignedCallDescr>)
  guard_no_exception(, descr=<Guard5>) [i19, p0]
  i21 = int_add(i19, 1)
  i23 = int_lt(i21, 114)
  guard_true(i23, descr=<Guard6>) [i21, p0]
  guard_value(i21, 86, descr=<Guard7>) [i21, p0]
  debug_merge_point('+<[>[_>_+<-]>.[<+>-]<<-]++++++++++.', 0)
  jump(p0, p1, i2, i3, descr=<Loop0>)
  [3c167c92bc6a15] jit-log-opt-loop}

I've trimmed the debug_merge_point lines a bit, they were really long.

So let's see what this does. This trace takes 4 parameters: 2 object pointers (p0 and p1) and 2 integers (i2 and i3). Looking at the debug lines, it seems to be tracing one iteration of this loop: "[>+<-]"

It starts executing the first operation on line 4, a ">", but immediately starts executing the next operation. The ">" had no instructions, and looks like it was optimized out completely. This loop must always act on the same part of the tape, the tape pointer is constant for this trace. An explicit advance operation is unnecessary.

Lines 5 to 8 are the instructions for the "+" operation. First it gets the array item from the array in pointer p1 at index i2 (line 6), adds 1 to it and stores it in i6 (line 7), and stores it back in the array (line 8).

Line 9 starts the "<" instruction, but it is another no-op. It seems that i2 and i3 passed into this routine are the two tape pointers used in this loop already calculated. Also deduced is that p1 is the tape array. It's not clear what p0 is.

Lines 10 through 13 perform the "-" operation: get the array value (line 11), subtract (line 12) and set the array value (line 13).

Next, on line 14, we come to the "]" operation. Lines 15 and 16 check whether i9 is true (non-zero). Looking up, i9 is the array value that we just decremented and stored, now being checked as the loop condition, as expected (remember the definition of "]"). Line 16 is a guard, if the condition is not met, execution jumps somewhere else, in this case to the routine called <Guard2> and is passed one parameter: p0.

Assuming we pass the guard, lines 17 through 23 are doing the dictionary lookup to bracket_map to find where the program counter should jump to. I'm not too familiar with what the instructions are actually doing, but it looks like there are two external calls and 3 guards. This seems quite expensive, especially since we know bracket_map will never change (PyPy doesn't know that). We'll see below how to optimize this.

Line 24 increments the newly acquired instruction pointer. Lines 25 and 26 make sure it's less than the program's length.

Additionally, line 27 guards that i21, the incremented instruction pointer, is exactly 86. This is because it's about to jump to the beginning (line 29) and the instruction pointer being 86 is a precondition to this block.

Finally, the loop closes up at line 28 so the JIT can jump to loop body <Loop0> to handle that case (line 29), which is the beginning of the loop again. It passes in parameters (p0, p1, i2, i3).

Optimizing

As mentioned, every loop iteration does a dictionary lookup to find the corresponding matching bracket for the final jump. This is terribly inefficient, the jump target is not going to change from one loop to the next. This information is constant and should be compiled in as such.

The problem is that the lookups are coming from a dictionary, and PyPy is treating it as opaque. It doesn't know the dictionary isn't being modified or isn't going to return something different on each query.

What we need to do is provide another hint to the translation to say that the dictionary query is a pure function, that is, its output depends only on its inputs and the same inputs should always return the same output.

To do this, we use a provided function decorator pypy.rlib.jit.purefunction, and wrap the dictionary call in a decorated function:

@purefunction
def get_matching_bracket(bracket_map, pc):
    return bracket_map[pc]

This version can be found at example5.py

Translate again with the JIT option and observe the speedup. Mandelbrot now only takes 6 seconds! (from 12 seconds before this optimization)

Let's take a look at the trace from the same function:

[3c29fad7b792b0] {jit-log-opt-loop
# Loop 0 : loop with 15 ops
[p0, p1, i2, i3]
debug_merge_point('+<[>[_>_+<-]>.[<+>-]<<-]++++++++++.', 0)
debug_merge_point('+<[>[>_+_<-]>.[<+>-]<<-]++++++++++.', 0)
i4 = getarrayitem_gc(p1, i2, descr=<SignedArrayDescr>)
i6 = int_add(i4, 1)
setarrayitem_gc(p1, i2, i6, descr=<SignedArrayDescr>)
debug_merge_point('+<[>[>+_<_-]>.[<+>-]<<-]++++++++++.', 0)
debug_merge_point('+<[>[>+<_-_]>.[<+>-]<<-]++++++++++.', 0)
i7 = getarrayitem_gc(p1, i3, descr=<SignedArrayDescr>)
i9 = int_sub(i7, 1)
setarrayitem_gc(p1, i3, i9, descr=<SignedArrayDescr>)
debug_merge_point('+<[>[>+<-_]_>.[<+>-]<<-]++++++++++.', 0)
i10 = int_is_true(i9)
guard_true(i10, descr=<Guard2>) [p0]
debug_merge_point('+<[>[_>_+<-]>.[<+>-]<<-]++++++++++.', 0)
jump(p0, p1, i2, i3, descr=<Loop0>)
[3c29fad7ba32ec] jit-log-opt-loop}

Much better! Each loop iteration is an add, a subtract, two array loads, two array stores, and a guard on the exit condition. That's it! This code doesn't require any program counter manipulation.

I'm no expert on optimizations, this tip was suggested by Armin Rigo on the pypy-dev list. Carl Friedrich has a series of posts on how to optimize your interpreter that are also very useful: http://bit.ly/bundles/cfbolz/1

Final Words

I hope this has shown some of you what PyPy is all about other than a faster implementation of Python.

For those that would like to know more about how the process works, there are several academic papers explaining the process in detail that I recommend. In particular: Tracing the Meta-Level: PyPy's Tracing JIT Compiler.

See http://readthedocs.org/docs/pypy/en/latest/extradoc.html

This is the second part of a tutorial written by Andrew Brown. The first part described how to write an interpreter with PyPy.

Adding JIT

Refer back to example2.py for the following.

So let's tell PyPy this info. Start by importing the JitDriver class and making an instance:

from pypy.rlib.jit import JitDriver
jitdriver = JitDriver(greens=['pc', 'program', 'bracket_map'],
        reds=['tape'])

And we add this line to the very top of the while loop in the mainloop function:

jitdriver.jit_merge_point(pc=pc, tape=tape, program=program,
        bracket_map=bracket_map)

We also need to define a JitPolicy. We're not doing anything fancy, so this is all we need somewhere in the file:

def jitpolicy(driver):
    from pypy.jit.codewriter.policy import JitPolicy
    return JitPolicy()

See this example at example3.py

Now try translating again, but with the flag --opt=jit:

$ python ./pypy/pypy/translator/goal/translate.py --opt=jit example3.py

A bit about Tracing JIT Compilers

A good place to start for more information is http://en.wikipedia.org/wiki/Just-in-time_compilation

Debugging and Trace Logs

Can we do any better? How can we see what the JIT is doing? Let's do two things.

First, let's add a get_printable_location function, which is used during debug trace logging:

def get_location(pc, program, bracket_map):
    return "%s_%s_%s" % (
            program[:pc], program[pc], program[pc+1:]
            )
jitdriver = JitDriver(greens=['pc', 'program', 'bracket_map'], reds=['tape'],
        get_printable_location=get_location)

Download this as example4.py and translate it the same as example3.py.

Now let's run a test program (test.b, which just prints the letter "A" 15 or so times in a loop) with trace logging:

$ PYPYLOG=jit-log-opt:logfile ./example4-c test.b

Now take a look at the file "logfile". This file is quite hard to read, so here's my best shot at explaining it.

Each trace starts with a line that looks like this:

[3c091099e7a4a7] {jit-log-opt-loop

and ends with a line like this:

[3c091099eae17d jit-log-opt-loop}

The next line tells you which loop number it is, and how many ops are in it. In my case, the first trace looks like this:

  [3c167c92b9118f] {jit-log-opt-loop
  # Loop 0 : loop with 26 ops
  [p0, p1, i2, i3]
  debug_merge_point('+<[>[_>_+<-]>.[<+>-]<<-]++++++++++.', 0)
  debug_merge_point('+<[>[>_+_<-]>.[<+>-]<<-]++++++++++.', 0)
  i4 = getarrayitem_gc(p1, i2, descr=<SignedArrayDescr>)
  i6 = int_add(i4, 1)
  setarrayitem_gc(p1, i2, i6, descr=<SignedArrayDescr>)
  debug_merge_point('+<[>[>+_<_-]>.[<+>-]<<-]++++++++++.', 0)
  debug_merge_point('+<[>[>+<_-_]>.[<+>-]<<-]++++++++++.', 0)
  i7 = getarrayitem_gc(p1, i3, descr=<SignedArrayDescr>)
  i9 = int_sub(i7, 1)
  setarrayitem_gc(p1, i3, i9, descr=<SignedArrayDescr>)
  debug_merge_point('+<[>[>+<-_]_>.[<+>-]<<-]++++++++++.', 0)
  i10 = int_is_true(i9)
  guard_true(i10, descr=<Guard2>) [p0]
  i14 = call(ConstClass(ll_dict_lookup__dicttablePtr_Signed_Signed), ConstPtr(ptr12), 90, 90, descr=<SignedCallDescr>)
  guard_no_exception(, descr=<Guard3>) [i14, p0]
  i16 = int_and(i14, -9223372036854775808)
  i17 = int_is_true(i16)
  guard_false(i17, descr=<Guard4>) [i14, p0]
  i19 = call(ConstClass(ll_get_value__dicttablePtr_Signed), ConstPtr(ptr12), i14, descr=<SignedCallDescr>)
  guard_no_exception(, descr=<Guard5>) [i19, p0]
  i21 = int_add(i19, 1)
  i23 = int_lt(i21, 114)
  guard_true(i23, descr=<Guard6>) [i21, p0]
  guard_value(i21, 86, descr=<Guard7>) [i21, p0]
  debug_merge_point('+<[>[_>_+<-]>.[<+>-]<<-]++++++++++.', 0)
  jump(p0, p1, i2, i3, descr=<Loop0>)
  [3c167c92bc6a15] jit-log-opt-loop}

I've trimmed the debug_merge_point lines a bit, they were really long.

Lines 10 through 13 perform the "-" operation: get the array value (line 11), subtract (line 12) and set the array value (line 13).

Line 24 increments the newly acquired instruction pointer. Lines 25 and 26 make sure it's less than the program's length.

Finally, the loop closes up at line 28 so the JIT can jump to loop body <Loop0> to handle that case (line 29), which is the beginning of the loop again. It passes in parameters (p0, p1, i2, i3).

Optimizing

To do this, we use a provided function decorator pypy.rlib.jit.purefunction, and wrap the dictionary call in a decorated function:

@purefunction
def get_matching_bracket(bracket_map, pc):
    return bracket_map[pc]

This version can be found at example5.py

Translate again with the JIT option and observe the speedup. Mandelbrot now only takes 6 seconds! (from 12 seconds before this optimization)

Let's take a look at the trace from the same function:

[3c29fad7b792b0] {jit-log-opt-loop
# Loop 0 : loop with 15 ops
[p0, p1, i2, i3]
debug_merge_point('+<[>[_>_+<-]>.[<+>-]<<-]++++++++++.', 0)
debug_merge_point('+<[>[>_+_<-]>.[<+>-]<<-]++++++++++.', 0)
i4 = getarrayitem_gc(p1, i2, descr=<SignedArrayDescr>)
i6 = int_add(i4, 1)
setarrayitem_gc(p1, i2, i6, descr=<SignedArrayDescr>)
debug_merge_point('+<[>[>+_<_-]>.[<+>-]<<-]++++++++++.', 0)
debug_merge_point('+<[>[>+<_-_]>.[<+>-]<<-]++++++++++.', 0)
i7 = getarrayitem_gc(p1, i3, descr=<SignedArrayDescr>)
i9 = int_sub(i7, 1)
setarrayitem_gc(p1, i3, i9, descr=<SignedArrayDescr>)
debug_merge_point('+<[>[>+<-_]_>.[<+>-]<<-]++++++++++.', 0)
i10 = int_is_true(i9)
guard_true(i10, descr=<Guard2>) [p0]
debug_merge_point('+<[>[_>_+<-]>.[<+>-]<<-]++++++++++.', 0)
jump(p0, p1, i2, i3, descr=<Loop0>)
[3c29fad7ba32ec] jit-log-opt-loop}

Much better! Each loop iteration is an add, a subtract, two array loads, two array stores, and a guard on the exit condition. That's it! This code doesn't require any program counter manipulation.

Final Words

I hope this has shown some of you what PyPy is all about other than a faster implementation of Python.

See http://readthedocs.org/docs/pypy/en/latest/extradoc.html

Posted by Carl Friedrich Bolz-Tereick at 14:51 10 Comments

Tuesday, April 5, 2011

Tutorial: Writing an Interpreter with PyPy, Part 1

This is a guest blog post written by Andrew Brown, with help from the PyPy developers on the pypy-dev mailing list.

This tutorial's master copy and supporting files live at https://bitbucket.org/brownan/pypy-tutorial/

When I first learned about the PyPy project, it took me a while to figure out exactly what it was about. For those that don't already know, it's two things:

A set of tools for implementing interpreters for interpreted languages
An implementation of Python using this toolchain

The second part is probably what most people think PyPy is, but this tutorial is not about their Python interpreter. It is about writing your own interpreter for your own language.

This is the project I undertook to help myself better understand how PyPy works and what it's all about.

This tutorial assumes you know very little about PyPy, how it works, and even what it's all about. I'm starting from the very beginning here.

What PyPy Does

Here's a brief overview of what PyPy can do. Let's say you want to write an interpreted language. This involves writing some kind of source code parser, a bytecode interpretation loop, and lots of standard library code.

That's quite a bit of work for moderately complicated languages, and there's a lot of low level work involved. Writing the parser and compiler code usually isn't fun, that's why there are tools out there to generate parsers and compilers for you.

Even then, you still must worry about memory management in your interpreter, and you're going to be re-implementing a lot if you want data types like arbitrary precision integers, nice general hash tables, and such. It's enough to put someone off from implementing their idea for a language.

Wouldn't it be nice if you could write your language in an existing high level language like, for example, Python? That sure would be ideal, you'd get all the advantages of a high level language like automatic memory management and rich data types at your disposal. Oh, but an interpreted language interpreting another language would be slow, right? That's twice as much interpreting going on.

As you may have guessed, PyPy solves this problem. PyPy is a sophisticated toolchain for analyzing and translating your interpreter code to C code (or JVM or CLI). This process is called "translation", and it knows how to translate quite a lot of Python's syntax and standard libraries, but not everything. All you have to do is write your interpreter in RPython, a subset of the Python language carefully defined to allow this kind of analysis and translation, and PyPy will produce for you a very efficient interpreter.

Because efficient interpreters should not be hard to write.

The Language

The language I've chosen to implement is dead simple. The language runtime consists of a tape of integers, all initialized to zero, and a single pointer to one of the tape's cells. The language has 8 commands, described here:

>: Moves the tape pointer one cell to the right

<: Moves the tape pointer one cell to the left
+: Increments the value of the cell underneath the pointer
-: Decrements the value of the cell underneath the pointer

[: If the cell under the current pointer is 0, skip to the instruction after the matching ]

]: Skip back to the matching [ (evaluating its condition)

.: Print out a single byte to stdout from the cell under the pointer

,: Read in a single byte from stdin to the cell under the pointer

Any unrecognized bytes are ignored.

Some of you may recognize this language. I will be referring to it as BF.

One thing to notice is that the language is its own bytecode; there is no translation from source code to bytecode. This means that the language can be interpreted directly: the main eval loop of our interpreter will operate right on the source code. This simplifies the implementation quite a bit.

First Steps

Let's start out by writing a BF interpreter in plain old Python. The first step is sketching out an eval loop:

def mainloop(program):
    tape = Tape()
    pc = 0
    while pc < len(program):
        code = program[pc]

        if code == ">":
            tape.advance()
        elif code == "<":
            tape.devance()
        elif code == "+":
            tape.inc()
        elif code == "-":
            tape.dec()
        elif code == ".":
            sys.stdout.write(chr(tape.get()))
        elif code == ",":
            tape.set(ord(sys.stdin.read(1)))
        elif code == "[" and value() == 0:
            # Skip forward to the matching ]
        elif code == "]" and value() != 0:
            # Skip back to the matching [

        pc += 1

As you can see, a program counter (pc) holds the current instruction index. The first statement in the loop gets the instruction to execute, and then a compound if statement decides how to execute that instruction.

The implementation of [ and ] are left out here, but they should change the program counter to the value of the matching bracket. (The pc then gets incremented, so the condition is evaluated once when entering a loop, and once at the end of each iteration)

Here's the implementation of the Tape class, which holds the tape's values as well as the tape pointer:

class Tape(object):
    def __init__(self):
        self.thetape = [0]
        self.position = 0

    def get(self):
        return self.thetape[self.position]
    def set(self, val):
        self.thetape[self.position] = val
    def inc(self):
        self.thetape[self.position] += 1
    def dec(self):
        self.thetape[self.position] -= 1
    def advance(self):
        self.position += 1
        if len(self.thetape) <= self.position:
            self.thetape.append(0)
    def devance(self):
        self.position -= 1

As you can see, the tape expands as needed to the right, indefinitely. We should really add some error checking to make sure the pointer doesn't go negative, but I'm not worrying about that now.

Except for the omission of the "[" and "]" implementation, this code will work fine. However, if the program has a lot of comments, it will have to skip over them one byte at a time at runtime. So let's parse those out once and for all.

At the same time, we'll build a dictionary mapping between brackets, so that finding a matching bracket is just a single dictionary lookup. Here's how:

def parse(program):
    parsed = []
    bracket_map = {}
    leftstack = []

    pc = 0
    for char in program:
        if char in ('[', ']', '<', '>', '+', '-', ',', '.'):
            parsed.append(char)

            if char == '[':
                leftstack.append(pc)
            elif char == ']':
                left = leftstack.pop()
                right = pc
                bracket_map[left] = right
                bracket_map[right] = left
            pc += 1

    return "".join(parsed), bracket_map

This returns a string with all invalid instructions removed, and a dictionary mapping bracket indexes to their matching bracket index.

All we need is some glue code and we have a working BF interpreter:

def run(input):
    program, map = parse(input.read())
    mainloop(program, map)

if __name__ == "__main__":
    import sys
    run(open(sys.argv[1], 'r'))

If you're following along at home, you'll also need to change the signature of mainloop() and implement the bracket branches of the if statement. Here's the complete example: example1.py

At this point you can try it out to see that it works by running the interpreter under python, but be warned, it will be very slow on the more complex examples:

$ python example1.py 99bottles.b

You can find mandel.b and several other example programs (not written by me) in my repository.

PyPy Translation

But this is not about writing a BF interpreter, this is about PyPy. So what does it take to get PyPy to translate this into a super-fast executable?

As a side note, there are some simple examples in the pypy/translator/goal directory of the PyPy source tree that are helpful here. My starting point for learning this was the example "targetnopstandalone.py", a simple hello world for PyPy.

For our example, the module must define a name called "target" which returns the entry point. The translation process imports your module and looks for that name, calls it, and the function object returned is where it starts the translation.

def run(fp):
    program_contents = ""
    while True:
        read = os.read(fp, 4096)
        if len(read) == 0:
            break
        program_contents += read
    os.close(fp)
    program, bm = parse(program_contents)
    mainloop(program, bm)

def entry_point(argv):
    try:
        filename = argv[1]
    except IndexError:
        print "You must supply a filename"
        return 1

    run(os.open(filename, os.O_RDONLY, 0777))
    return 0

def target(*args):
    return entry_point, None

if __name__ == "__main__":
    entry_point(sys.argv)

The entry_point function is passed the command line arguments when you run the resulting executable.

A few other things have changed here too. See the next section...

About RPython

Let's talk a bit about RPython at this point. PyPy can't translate arbitrary Python code because Python is a bit too dynamic. There are restrictions on what standard library functions and what syntax constructs one can use. I won't be going over all the restrictions, but for more information see http://readthedocs.org/docs/pypy/en/latest/coding-guide.html#restricted-python

In the example above, you'll see a few things have changed. I'm now using low level file descriptors with os.open and os.read instead of file objects. The implementation of "." and "," are similarly tweaked (not shown above). Those are the only changes to make to this code, the rest is simple enough for PyPy to digest.

That wasn't so hard, was it? I still get to use dictionaries, expandable lists, and even classes and objects! And if low level file descriptors are too low for you, there are some helpful abstractions in the rlib.streamio module included with PyPy's "RPython standard library."

For the example thus far, see example2.py

Translating

If you haven't already, check yourself out the latest version of PyPy from their bitbucket.org repository:

$ hg clone https://bitbucket.org/pypy/pypy

(A recent revision is necessary because of a bugfix that makes my example possible)

The script to run is in "pypy/translator/goal/translate.py". Run this script, passing in our example module as an argument.

[A note added much later: this script has been moved to "rpython/bin/rpython".]

$ python ./pypy/pypy/translator/goal/translate.py example2.py

(You can use PyPy's python interpreter for extra speed, but it's not necessary)

PyPy will churn for a bit, drawing some nice looking fractals to your console while it works. It takes around 20 seconds on my machine.

The result from this is an executable binary that interprets BF programs. Included in my repository are some example BF programs, including a mandelbrot fractal generator, which takes about 45 seconds to run on my computer. Try it out:

$ ./example2-c mandel.b

Compare this to running the interpreter un-translated on top of python:

$ python example2.py mandel.b

Takes forever, doesn't it?

So there you have it. We've successfully written our own interpreter in RPython and translated it with the PyPy toolchain.

(more in the next blog post...)

This is a guest blog post written by Andrew Brown, with help from the PyPy developers on the pypy-dev mailing list.

This tutorial's master copy and supporting files live at https://bitbucket.org/brownan/pypy-tutorial/

When I first learned about the PyPy project, it took me a while to figure out exactly what it was about. For those that don't already know, it's two things:

A set of tools for implementing interpreters for interpreted languages
An implementation of Python using this toolchain

The second part is probably what most people think PyPy is, but this tutorial is not about their Python interpreter. It is about writing your own interpreter for your own language.

This is the project I undertook to help myself better understand how PyPy works and what it's all about.

This tutorial assumes you know very little about PyPy, how it works, and even what it's all about. I'm starting from the very beginning here.

What PyPy Does

Because efficient interpreters should not be hard to write.

The Language

>: Moves the tape pointer one cell to the right

<: Moves the tape pointer one cell to the left
+: Increments the value of the cell underneath the pointer
-: Decrements the value of the cell underneath the pointer

[: If the cell under the current pointer is 0, skip to the instruction after the matching ]

]: Skip back to the matching [ (evaluating its condition)

.: Print out a single byte to stdout from the cell under the pointer

,: Read in a single byte from stdin to the cell under the pointer

Any unrecognized bytes are ignored.

Some of you may recognize this language. I will be referring to it as BF.

First Steps

Let's start out by writing a BF interpreter in plain old Python. The first step is sketching out an eval loop:

def mainloop(program):
    tape = Tape()
    pc = 0
    while pc < len(program):
        code = program[pc]

        if code == ">":
            tape.advance()
        elif code == "<":
            tape.devance()
        elif code == "+":
            tape.inc()
        elif code == "-":
            tape.dec()
        elif code == ".":
            sys.stdout.write(chr(tape.get()))
        elif code == ",":
            tape.set(ord(sys.stdin.read(1)))
        elif code == "[" and value() == 0:
            # Skip forward to the matching ]
        elif code == "]" and value() != 0:
            # Skip back to the matching [

        pc += 1

Here's the implementation of the Tape class, which holds the tape's values as well as the tape pointer:

class Tape(object):
    def __init__(self):
        self.thetape = [0]
        self.position = 0

    def get(self):
        return self.thetape[self.position]
    def set(self, val):
        self.thetape[self.position] = val
    def inc(self):
        self.thetape[self.position] += 1
    def dec(self):
        self.thetape[self.position] -= 1
    def advance(self):
        self.position += 1
        if len(self.thetape) <= self.position:
            self.thetape.append(0)
    def devance(self):
        self.position -= 1

As you can see, the tape expands as needed to the right, indefinitely. We should really add some error checking to make sure the pointer doesn't go negative, but I'm not worrying about that now.

At the same time, we'll build a dictionary mapping between brackets, so that finding a matching bracket is just a single dictionary lookup. Here's how:

def parse(program):
    parsed = []
    bracket_map = {}
    leftstack = []

    pc = 0
    for char in program:
        if char in ('[', ']', '<', '>', '+', '-', ',', '.'):
            parsed.append(char)

            if char == '[':
                leftstack.append(pc)
            elif char == ']':
                left = leftstack.pop()
                right = pc
                bracket_map[left] = right
                bracket_map[right] = left
            pc += 1

    return "".join(parsed), bracket_map

This returns a string with all invalid instructions removed, and a dictionary mapping bracket indexes to their matching bracket index.

All we need is some glue code and we have a working BF interpreter:

def run(input):
    program, map = parse(input.read())
    mainloop(program, map)

if __name__ == "__main__":
    import sys
    run(open(sys.argv[1], 'r'))

If you're following along at home, you'll also need to change the signature of mainloop() and implement the bracket branches of the if statement. Here's the complete example: example1.py

At this point you can try it out to see that it works by running the interpreter under python, but be warned, it will be very slow on the more complex examples:

$ python example1.py 99bottles.b

You can find mandel.b and several other example programs (not written by me) in my repository.

PyPy Translation

But this is not about writing a BF interpreter, this is about PyPy. So what does it take to get PyPy to translate this into a super-fast executable?

def run(fp):
    program_contents = ""
    while True:
        read = os.read(fp, 4096)
        if len(read) == 0:
            break
        program_contents += read
    os.close(fp)
    program, bm = parse(program_contents)
    mainloop(program, bm)

def entry_point(argv):
    try:
        filename = argv[1]
    except IndexError:
        print "You must supply a filename"
        return 1

    run(os.open(filename, os.O_RDONLY, 0777))
    return 0

def target(*args):
    return entry_point, None

if __name__ == "__main__":
    entry_point(sys.argv)

The entry_point function is passed the command line arguments when you run the resulting executable.

A few other things have changed here too. See the next section...

About RPython

For the example thus far, see example2.py

Translating

If you haven't already, check yourself out the latest version of PyPy from their bitbucket.org repository:

$ hg clone https://bitbucket.org/pypy/pypy

(A recent revision is necessary because of a bugfix that makes my example possible)

The script to run is in "pypy/translator/goal/translate.py". Run this script, passing in our example module as an argument.

[A note added much later: this script has been moved to "rpython/bin/rpython".]

$ python ./pypy/pypy/translator/goal/translate.py example2.py

(You can use PyPy's python interpreter for extra speed, but it's not necessary)

PyPy will churn for a bit, drawing some nice looking fractals to your console while it works. It takes around 20 seconds on my machine.

$ ./example2-c mandel.b

Compare this to running the interpreter un-translated on top of python:

$ python example2.py mandel.b

Takes forever, doesn't it?

So there you have it. We've successfully written our own interpreter in RPython and translated it with the PyPy toolchain.

(more in the next blog post...)

Posted by Carl Friedrich Bolz-Tereick at 14:44 16 Comments

Monday, April 4, 2011

PyPy Göteborg Post-Easter Sprint April 25 - May 1 2011

The next PyPy sprint will be in Gothenburg, Sweden. It is a public sprint, very suitable for newcomers. We'll focus on making the 1.5 release (if it hasn't already happened) and whatever interests the Sprint attendees.

Topics and goals

The main goal is to polish and release PyPy 1.5, supporting Python 2.7 as well as the last few months' improvements in the JIT (provided that it hasn't already happened). Other topics:

Going over our documentation, and classifying our docs in terms of mouldiness. Deciding what needs writing, and maybe writing it.
Helping people get their code running with PyPy
maybe work on EuroPython Training, and talks
Summer of Code preparation
speed.pypy.org
any other programming task is welcome too -- e.g. tweaking the Python or JavaScript interpreter, Stackless support, and so on.

Location

The sprint will be held in the apartment of Laura Creighton and Jacob Hallén which is at Götabergsgatan 22 in Gothenburg, Sweden. Here is a map. This is in central Gothenburg. It is between the tram stops of Vasaplatsen and Valand, (a distance of 4 blocks) where many lines call -- the 2, 3, 4, 5, 7, 10 and 13.

Probably cheapest and not too far away is to book accomodation at SGS Veckobostader. The Elite Park Avenyn Hotel is a luxury hotel just a few blocks away. There are scores of hotels a short walk away from the sprint location, suitable for every budget, desire for luxury, and desire for the unusual. You could, for instance, stay on a boat. Options are too numerous to go into here. Just ask in the mailing list or on the blog.

Hours will be from 10:00 until people have had enough. It's a good idea to arrive a day before the sprint starts and leave a day later. In the middle of the sprint there usually is a break day and it's usually ok to take half-days off if you feel like it.

Good to Know

Sweden is not part of the Euro zone. One SEK (krona in singular, kronor in plural) is roughly 1/10th of a Euro (9.36 SEK to 1 Euro).

The venue is central in Gothenburg. There is a large selection of places to get food nearby, from edible-and-cheap to outstanding. We often cook meals together, so let us know if you have any food allergies, dislikes, or special requirements.

Sweden uses the same kind of plugs as Germany. 230V AC.

The Sprint will be held the week following Easter. This means, as always, that Gothcon will be taking place the weekend before (Easter weekend). Gothcon, now in its 35 year, is the largest European game players conference. Some of you may be interested in arriving early for the board games. The conference site is only in Swedish, alas. You don't need to register in advance unless you are planning to host a tournament, (and it's too late for that anyway).

Getting Here

If are coming train, you will arrive at the Central Station. It is about 12 blocks to the site from there, or you can take a tram.

There are two airports which are local to Göteborg, Landvetter (the main one) and Gothenburg City Airport (where some budget airlines fly). If you arrive at Landvetter the airport bus stops right downtown at Elite Park Avenyn Hotel which is the second stop, 4 blocks from the Sprint site, as well as the end of the line, which is the Central Station. If you arrive at Gothenburg City Airport take the bus to the end of the line. You will be at the Central Station.

You can also arrive by ferry, from either Kiel in Germany or Frederikshavn in Denmark.

Who's Coming?

If you'd like to come, please let us know when you will be arriving and leaving, as well as letting us know your interests We'll keep a list of people which we'll update (which you can do so yourself if you have bitbucket pypy commit rights).