Tuesday, April 1, 2008

Trying to get PyPy to run on Python 3.0

As you surely know, Python 3.0 is coming; recently, they released Python 3.0 alpha 3, and the final version is expected around September.

As suggested by the migration guide (in the PEP 3000), we started by applying 2to3 to our standard interpreter, which is written in RPython (though we should call it RPython 2.4 now, as opposed to RPython 3.0 -- see below).

Converting was not seamless, but most of the resulting bugs were due to the new dict views, str/unicode changes and the missing "reduce" built-in. After forking and refactoring both our interpreter and the 2to3 script, the Python interpreter runs on Python 3.0 alpha 3!

Next step was to run 2to3 over the whole translation toolchain, i.e. the part of PyPy which takes care of analyzing the interpreter in order to produce efficient executables; after the good results we got with the standard interpreter, we were confident that it would have been relatively easy to run 2to3 over it: unfortunately, it was not :-(.

After letting 2to3 run for days and days uninterrupted, we decided to kill it: we assume that the toolchain is simply too complex to be converted in a reasonable amount of time.

So, we needed to think something else; THE great idea we had was to turn everything upside-down: if we can't port PyPy to Py3k, we can always port Py3k to PyPy!

Under the hood, the 2to3 conversion tool operates as a graph transformer: it takes the graph of your program (in the form of Python 2.x source file) and returns a transformed graph of the same program (in the form of Python 3.0 source file). Since the entire translation toolchain of PyPy is based on graph transformations, we could reuse it to modify the behaviour of the 2to3 tool. We wrote a general graph-inverter algorithm which, as the name suggests, takes a graph transformation and build the inverse transformation; then, we applied the graph inverter to 2to3, getting something that we called 3to2: it is important to underline that 3to2 was built by automatically analysing 2to3 and reversing its operation with only the help of a few manual hints. For this reason and because we are not keeping generated files under version control, we do not need to maintain this new tool in the Subversion repository.

Once we built 3to2, it was relatively easy to pipe its result to our interpreter, getting something that can run Python 3.0 programs.

Performance-wise, this approach has the problem of being slower at import time, because it needs to run (automatically) 3to2 every time the source is modified; in the future, we plan to apply our JIT techniques also to this part of the interpreter, trying to mitigate the slowdown until it is not noticeable anymore to the final user.

In the next weeks, we will work on the transformation (and probably publish the technique as a research paper, with a title like "Automatic Program Reversion on Intermediate Languages").

UPDATE: In case anybody didn't guess or didn't spot the acronym: The above was an April Fool's joke. Nearly nothing of it is true.

5 comments:

Anonymous said...

"After letting 2to3 run for days and days uninterrupted, we decided to kill it: we assume that the toolchain is simply too complex to be converted in a reasonable amount of time."

That was silly. Twisted got converted. I suppose that not even a meta-programing-languages-framework can be bigger thaan Twisted. Better luck next year.

Anonymous said...

I have a working implementation of the Parrot virtual machine in Py3K. After running it through your converter (my hosting service only runs 2.1!), I find that my implementation now only supports Perl 5 and Snobol. What gives?

fumanchu said...

Nice acronym, that.

Paddy3118 said...

Nice one ;-)

And the best I've read all day!

- Paddy.

Anonymous said...

Looks like hosting python 2.5 scripts on a Py3k interpreter might become a USP for PyPY ;)