Thursday, April 30, 2009

4 weeks of GDB

Hello.

So, according to our jit plan we're mostly done with point 1, that is to provide a JIT that compiles python code to assembler in the most horrible manner possible but doesn't break. That meant mostly 4 weeks of glaring at GDB and megabytess of assembler generated by C code generated from python code. The figure of 4 weeks proves that our approach is by far superior to the one of psyco, since Armin says it's "only 4 weeks" :-)

Right now, pypy compiled with JIT can run the whole CPython test suite without crashing, which means we're done with obvious bugs and the only ones waiting for us are really horrible. (Or they really don't exist. At least they should never be about obscure Python corner cases: they can only be in the 10'000 lines of relatively clear code that is our JIT generator.)

But... the fun thing is that we can actually concentrate on optimizations! So the next step is to provide a JIT that is correct *and* actually speeds up python. Stay tuned for more :-)

Cheers,
fijal, armin & benjamin

UPDATE: for those of you blessed with no knowledge of C, gdb stands for GNU debugger, a classic debugger for C. (It's also much more powerful than python debugger, pdb, which is kind of surprising).

Tuesday, April 28, 2009

1.1 final released

We just released PyPy 1.1 final. Not much changed since the beta, apart from some more fixed bugs. Have fun with it!

Tuesday, April 21, 2009

Roadmap for JIT

Hello.

First a disclaimer. This post is more about plans for future than current status. We usually try to write about things that we have done, because it's much much easier to promise things than to actually make it happen, but I think it's important enough to have some sort of roadmap.

In recent months we came to the point where the 5th generation of JIT prototype was working as nice or even a bit nicer than 1st one back in 2007. Someone might ask "so why did you spend all this time without going forward?". And indeed, we spend a lot of time moving sideways, but as posted, we also spent a lot of time doing some other things, which are important as well. The main advantage of current JIT incarnation is much much simpler than the first one. Even I can comprehend it, which is much of an improvement :-)

So, the prototype is working and gives very nice speedups in range of 20-30x over CPython. We're pretty confident this prototype will work and will produce fast python interpreter eventually. So we decided that now we'll work towards changing prototype into something stable and solid. This might sound easy, but in fact it's not. Having stable assembler backend and optimizations that keep semantics is not as easy as it might sound.

The current roadmap, as I see it, looks like as following:

  • Provide a JIT that does not speedup things, but produce assembler without optimizations turned on, that is correct and able to run CPython's library tests on a nightly basis.
  • Introduce simple optimizations, that should make above JIT a bit faster than CPython. With optimizations disabled JIT is producing incredibly dumb assembler, which is slower than correspoding C code, even with removal of interpretation overhead (which is not very surprising).
  • Backport optimizations from JIT prototype, one by one, keeping an eye on how they perform and making sure they don't break anything.
  • Create new optimizations, like speeding up attribute access.
  • Profit.

This way, we can hopefully provide a working JIT, which gives fast python interpreter, which is a bit harder than just a nice prototype.

Tell us what you think about this plan.

Cheers,
fijal & others.

Leysin Sprint Report

The Leysin sprint is nearing its end, as usual here is an attempt at a summary

of what we did.

Beautiful Leysin Landscape

Release Work

Large parts of the sprint were dedicated to fixing bugs. Since the easy bugs seem to have been fixed long ago, those were mostly very annoying and hard bugs. This work was supported by our buildbots, which we tried to get free of test-failures. This was worked on by nearly all participants of the sprint (Samuele, Armin, Anto, Niko, Anders, Christian, Carl Friedrich). One particularly annoying bug was the differences in the tracing events that PyPy produces (fixed by Anders, Samuele and Christian). Some details about larger tasks are in the sections below.

The work culminated in the beta released on Sunday.

Stackless

A large number of problems came from our stackless features, which do some advanced things and thus seem to contain advanced bugs. Samuele and Carl Friedrich spent some time fixing tasklet pickling and unpickling. This was achieved by supporting the (un)pickling of builtin code objects. In addition they fixed some bugs in the finalization of tasklets. This needs some care because the __del__ of a tasklet cannot run at arbitrary points in time, but only at safe points. This problem was a bit subtle to get right, and popped up nearly every morning of the sprint in form of a test failure.

Armin and Niko added a way to restrict the stack depth of the RPython-level stack. This can useful when using stackless, because if this is not there it is possible that you fill your whole heap with stack frames in the case of an infinite recursion. Then they went on to make stackless not segfault when threads are used at the same time, or if a callback from C library code is in progress. Instead you get a RuntimeError now, which is not good but better than a segfault.

Anto and Armin working on the JIT

Killing Features

During the sprint we discussed the fate of the LLVM and the JS backends. Both have not really been maintained for some time, and even partially untested (their tests were skipped). Also their usefulness appears to be limited. The JS backend is cool in principle, but has some serious limitations due to the fact that JavaScript is really a dynamic language, while RPython is rather static. This made it hard to use some features of JS from RPython, e.g. RPython does not support closures of any kind.

The LLVM backend had its own set of problems. For a long time it produced the fastest form of PyPy's Python interpreter, by first using the LLVM backend, applying the LLVM optimizations to the result, then using LLVM's C backend to produce C code, then apply GCC to the result :-). However, it is not clear that it is still useful to directly produce LLVM bitcode, since LLVM has rather good C frontends nowadays, with llvm-gcc and clang. It is likely that we will use LLVM in the future in our JIT (but that's another story, based on different code).

Therefore we decided to remove these two backends from SVN, which Samuele and Carl Friedrich did. They are not dead, only resting until somebody who is interested in maintaining them steps up.

Windows

One goal of the release is good Windows-support. Anders and Samuele set up a new windows buildbot which revealed a number of failures. Those were attacked by Anders, Samuele and Christian as well as by Amaury (who was not at the sprint, but thankfully did a lot of Windows work in the last months).

OS X

Christian with some help by Samuele tried to get translation working again under Mac OS X. This was a large mess, because of different behaviours of some POSIX functionality in Leopard. It is still possible to get the old behaviour back, but whether that was enabled or not depended on a number of factors such as which Python is used. Eventually they managed to successfully navigate that maze and produce something that almost works (there is still a problem remaining about OpenSSL).

Samuele and Carl Friedrich pretending to work on something

Documentation

The Friday of the sprint was declared to be a documentation day, where (nearly) no coding was allowed. This resulted in a newly structured and improved getting started document (done by Carl Friedrich, Samuele and some help of Niko) and a new document describing differences to CPython (Armin, Carl Friedrich) as well as various improvements to existing documents (everybody else). Armin undertook the Sisyphean task of listing all talks, paper and related stuff of the PyPy project.

Various Stuff

Java Backend Work

Niko and Anto worked on the JVM backend for a while. First they had to fix translation of the Python interpreter to Java. Then they tried to improve the performance of the Python interpreter when translated to Java. Mostly they did a lot of profiling to find performance bottlenecks. They managed to improve performance by 40% by overriding fillInStackTrace of the generated exception classes. Apart from that they found no simple-to-fix performance problems.

JIT Work

Armin gave a presentation about the current state of the JIT to the sprinters as well as Adrian Kuhn, Toon Verwaest and Camillo Bruni of the University of Bern who came to visit for one day. There was a bit of work on the JIT going on too; Armin and Anto tried to get closer to having a working JIT on top of the CLI.

Sunday, April 19, 2009

Beta for 1.1.0 released

Today we are releasing a beta of the upcoming PyPy 1.1 release. There are some Windows and OS X issues left that we would like to address between now and the final release but apart from this things should be working. We would appreciate feedback.

The PyPy development team.

PyPy 1.1: Compatibility & Consolidation

Welcome to the PyPy 1.1 release - the first release after the end of EU funding. This release focuses on making PyPy's Python interpreter more compatible with CPython (currently CPython 2.5) and on making the interpreter more stable and bug-free.

PyPy's Getting Started lives at:

http://codespeak.net/pypy/dist/pypy/doc/getting-started.html

Highlights of This Release

Other Changes

What is PyPy?

Technically, PyPy is both a Python interpreter implementation and an advanced compiler, or more precisely a framework for implementing dynamic languages and generating virtual machines for them.

The framework allows for alternative frontends and for alternative backends, currently C, Java and .NET. For our main target "C", we can "mix in" different garbage collectors and threading models, including micro-threads aka "Stackless". The inherent complexity that arises from this ambitious approach is mostly kept away from the Python interpreter implementation, our main frontend.

Socially, PyPy is a collaborative effort of many individuals working together in a distributed and sprint-driven way since 2003. PyPy would not have gotten as far as it has without the coding, feedback and general support from numerous people.

Have fun,

the PyPy release team, [in alphabetical order]

Amaury Forgeot d'Arc, Anders Hammerquist, Antonio Cuni, Armin Rigo, Carl Friedrich Bolz, Christian Tismer, Holger Krekel, Maciek Fijalkowski, Samuele Pedroni

and many others: http://codespeak.net/pypy/dist/pypy/doc/contributor.html

Wednesday, April 15, 2009

Leysin Sprint Started

The Leysin Sprint started today. The weather is great and the view is wonderful, as usual. Technically we are working on the remaining test failures of the nightly test runs and are generally trying to fix various long-postponed bugs. I will try to give more detailed reports as the sprint progresses.

Tuesday, April 7, 2009

Pycon videos are online

Hi.

We didn't yet write full pycon summary, but both of our talks are now online: PyPy status talk and python in a sandbox.

Update:

Slides are also available: PyPy status talk and Python in a sandbox.

Enjoy!
fijal & holger