Tuesday, June 10, 2008

List comprehension implementation details

List comprehensions are a nice feature in Python. They are, however, just syntactic sugar for for loops. E.g. the following list comprehension:

def f(l):
    return [i ** 2 for i in l if i % 3 == 0]

is sugar for the following for loop:

def f(l):
    result = []
    for i in l:
        if i % 3 == 0:
            result.append(i ** 2)
    return result

The interesting bit about this is that list comprehensions are actually implemented in almost exactly this way. If one disassembles the two functions above one gets sort of similar bytecode for both (apart from some details, like the fact that the append in the list comprehension is done with a special LIST_APPEND bytecode).

Now, when doing this sort of expansion there are some classical problems: what name should the intermediate list get that is being built? (I said classical because this is indeed one of the problems of many macro systems). What CPython does is give the list the name _[1] (and _[2]... with nested list comprehensions). You can observe this behaviour with the following code:

$ python
Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> [dir() for i in [0]][0]
['_[1]', '__builtins__', '__doc__', '__name__', 'i']
>>> [[dir() for i in [0]][0] for j in [0]][0]
['_[1]', '_[2]', '__builtins__', '__doc__', '__name__', 'i', 'j']

That is a sort of nice decision, since you can not reach that name by any "normal" means. Of course you can confuse yourself in funny ways if you want:

>>> [locals()['_[1]'].extend([i, i + 1]) for i in range(10)]
[0, 1, None, 1, 2, None, 2, 3, None, 3, 4, None, 4, 5, None, 5, 6, None, 6, 7, None, 7, 8, None, 8, 9, None, 9, 10, None]

Now to the real reason why I am writing this blog post. PyPy's Python interpreter implements list comprehensions in more or less exactly the same way, with on tiny difference: the name of the variable:

$ pypy-c-53594-generation-allworking
Python 2.4.1 (pypy 1.0.0 build 53594) on linux2
Type "help", "copyright", "credits" or "license" for more information.
``the globe is our pony, the cosmos our real horse''
>>>> [dir() for i in [0]][0]
['$list0', '__builtins__', '__doc__', '__name__', 'i']

Now, that shouldn't really matter for anybody, should it? Turns out it does. The following way too clever code is apparently used a lot:

__all__ = [__name for __name in locals().keys() if not __name.startswith('_') '
               or __name == '_']

In PyPy this will give you a "$list0" in __all__, which will prevent the import of that module :-(. I guess I need to change the name to match CPython's.

Lesson learned: no detail is obscure enough to not have some code depending on it. Mostly problems on this level of obscurity are the things we are fixing in PyPy at the moment.

Monday, June 9, 2008

Better Profiling Support for PyPy

As PyPy is getting more and more usable, we need better tools to use to work on certain applications running on top of PyPy. Out of this interest, I spent some time implementing the _lsprof module, which is a part of the standard library since Python2.5. It is necessary for the cProfile module, which can profile Python programs with high accuracy and a lot less overhead than the older, pure-python profile module. Together with the excellent lsprofcalltree script, you can display this data using kcachegrind, which gives you great visualization possibilities for your profile data.

Cheers,
fijal

Wednesday, May 28, 2008

Threads and GCs

Hi all,

We can now compile a pypy-c that includes both thread support and one of our semi-advanced garbage collectors. This means that threaded Python programs can now run not only with a better performance, but without the annoyances of the Boehm garbage collector. (For example, Boehm doesn't like too much seeing large numbers of __del__(), and our implementation of ctypes uses them everywhere.)

Magic translation command (example):

   translate.py --thread --gc=hybrid targetpypystandalone --faassen --allworkingmodules

Note that multithreading in PyPy is based on a global interpreter lock, as in CPython. I imagine that we will get rid of the global interpreter lock at some point in the future -- I can certainly see how this might be done in PyPy, unlike in CPython -- but it will be a lot of work nevertheless. Given our current priorities, it will probably not occur soon unless someone steps in.

Progresses on the CLI JIT backend front

In the last months, I've actively worked on the CLI backend for PyPy's JIT generator, whose goal is to automatically generate JIT compilers that produces .NET bytecode on the fly.

The CLI JIT backend is far from be completed and there is still a lot of work to be done before it can handle the full PyPy's Python interpreter; nevertheless, yesterday I finally got the first .NET executable that contains a JIT for a very simple toy language called tlr, which implements an interpreter for a minimal register based virtual machine with only 8 operations.

To compile the tlr VM, follow these steps:

  1. get a fresh checkout of the oo-jit branch, i.e. the branch where the CLI JIT development goes on:

    $ svn co http://codespeak.net/svn/pypy/branch/oo-jit
    
  2. go to the oo-jit/pypy/jit/tl directory, and compile the tlr VM with the CLI backend and JIT enabled:

    $ cd oo-jit/pypy/jit/tl/
    $ ../../translator/goal/translate.py -b cli --jit --batch targettlr
    

The goal of our test program is to compute the square of a given number; since the only operations supported by the VM are addition and negation, we compute the result by doing repetitive additions; I won't describe the exact meaning of all the tlr bytecodes here, as they are quite self-documenting:

ALLOCATE,    3,   # make space for three registers
MOV_A_R,     0,   # i = a
MOV_A_R,     1,   # copy of 'a'

SET_A,       0,
MOV_A_R,     2,   # res = 0

# 10:
SET_A,       1,
NEG_A,
ADD_R_TO_A,  0,
MOV_A_R,     0,   # i--

MOV_R_A,     2,
ADD_R_TO_A,  1,
MOV_A_R,     2,   # res += a

MOV_R_A,     0,
JUMP_IF_A,  10,   # if i!=0: goto 10

MOV_R_A,     2,
RETURN_A          # return res

You can find the program also at the end of the tlr module; to get an assembled version of the bytecode, ready to be interpreted, run this command:

$ python tlr.py assemble > square.tlr

Now, we are ready to execute the code through the tlr VM; if you are using Linux/Mono, you can simply execute the targettlr-cli script that has been created for you; however, if you use Windows, you have to manually fish the executable inside the targettlr-cli-data directory:

# Linux
$ ./targettlr-cli square.tlr 16
256

# Windows
> targettlr-cli-data\main.exe square.tlr 16
256

Cool, our program computed the result correctly! But, how can we be sure that it really JIT compiled our code instead of interpreting it? To inspect the code that it's generated by our JIT compiler, we simply set the PYPYJITLOG environment variable to a filename, so that the JIT will create a .NET assembly containing all the code that has been generated by the JIT:

$ PYPYJITLOG=generated.dll ./targettlr-cli square.tlr 16
256
$ file generated.dll
generated.dll: MS-DOS executable PE  for MS Windows (DLL) (console) Intel 80386 32-bit

Now, we can inspect the DLL with any IL disassembler, such as ilasm or monodis; here is an excerpt of the disassembled code, that shows how our square.tlr bytecode has been compiled to .NET bytecode:

.method public static  hidebysig default int32 invoke (object[] A_0, int32 A_1)  cil managed
{
    .maxstack 3
    .locals init (int32 V_0, int32 V_1, int32 V_2, int32 V_3, int32 V_4, int32 V_5)

    ldc.i4 -1
    ldarg.1
    add
    stloc.1
    ldc.i4 0
    ldarg.1
    add
    stloc.2
    IL_0010:  ldloc.1
    ldc.i4.0
    cgt.un
    stloc.3
    ldloc.3
    brfalse IL_003b

    ldc.i4 -1
    ldloc.1
    add
    stloc.s 4
    ldloc.2
    ldarg.1
    add
    stloc.s 5
    ldloc.s 5
    stloc.2
    ldloc.s 4
    stloc.1
    ldarg.1
    starg 1

    nop
    nop
    br IL_0010

    IL_003b:  ldloc.2
    stloc.0
    br IL_0042

    ldloc.0
    ret
}

If you know a bit IL, you can see that the code generated is not optimal, as there are some redundant operations like all those stloc/ldloc pairs; however, while not optimal, it is still quite good code, not much different to what you would get by writing the square algorithm directly in e.g. C#.

As I said before, all of this is still work in progress and there is still much to be done. Stay tuned :-).

Monday, May 26, 2008

More windows support

Recently, thanks to Amaury Forgeot d'Arc and Michael Schneider, Windows became more of a first-class platform for PyPy's Python interpreter. Most RPython extension modules are now considered working (apart from some POSIX specific modules). Even CTypes now works on windows!

Next step would be to have better buildbot support for all supported platforms (Windows, Linux and OS X), so we can control and react to regressions quickly. (Buildbot is maintained by JP Calderone)

Cheers,
fijal

Friday, May 23, 2008

S3-Workshop Potsdam 2008 Writeup

Trying to give some notes about the S3 Workshop in Potsdam that several PyPyers and Spies (Armin, Carl Friedrich, Niko, Toon, Adrian) attended before the Berlin sprint. We presented a paper about SPy there. Below are some mostly random note about my (Carl Friedrich's) impressions of the conference and some talk notes. Before that I'd like to give thanks to the organizers who did a great job. The workshop was well organized, the social events were wonderful (a very relaxing boat trip in the many lakes around Potsdam and a conference dinner).

Video recordings of all the talks can be found on the program page.

Invited Talks

"Late-bound Object Lambda Architectures" by Ian Piumarta was quite an inspiring talk about VPRI's attempt at writing a flexible and understandable computing system in 20K lines of code. The talk was lacking a bit in technical details, so while it was inspiring I couldn't really say much about their implementation. Apart from that, I disagree with some of their goals, but that's the topic of another blog post.

"The Lively Kernel – A Self-supporting System on a Web Page" by Dan Ingalls. Dan Ingalls is one of the inventors of the original Smalltalk and of Squeak. He was talking about his latest work, the attempts of bringing a Squeak-like system to a web browser using JavaScript and SVG. To get some feel for what exactly The Lively Kernel is, it is easiest to just try it out (only works in Safari and Firefox 3 above Beta 5 though). I guess in a sense the progress of the Lively Kernel over Squeak is not that great but Dan seems to be having fun. Dan is an incredibly enthusiastic, friendly and positive person, it was really great meeting him. He even seemed to like some of the ideas in SPy.

"On Sustaining Self" by Richard P. Gabriel was a sort of deconstructivist multi-media-show train wreck of a presentation that was a bit too weird for my taste. There was a lot of music, there were sections in the presentation where Richard discussed with an alter ego, whose part he had recorded in advance and mangled with a sound editor. There was a large bit of a documentary about Levittown. Even the introduction and the questions were weird, with Pascal Constanza staring down the audience, without saying a word (nobody dared to ask questions). I am not sure I saw the point of the presentation, apart from getting the audience to think, which probably worked. It seems that there are people (e.g. Christian Neukirchen) that liked the presentation, though.

Research Papers

"SBCL - A Sanely Bootstrappable Common Lisp by Christophe Rhodes described the bootstrapping process of SBCL (Steel Bank Common Lisp). SBCL can be bootstrapped by a variety of Common Lisps, not just by itself. SBCL contains a complete blueprint of the initial image instead of always getting the new image by carefully mutating the old one. This bootstrapping approach is sort of similar to that of PyPy.

"Reflection for the Masses" by Charlotte Herzeel, Pascal Costanza, and Theo D'Hondt retraced some of the work of Brian Smith on reflection in Lisp. The talk was not very good, it was way too long (40 min), quite hard to understand because Charlotte Herzeel was talking in a very low voice. The biggest mistake in her talk was in my opinion that she spent too much time explaining a more or less standard meta-circular interpreter for Lisp and then running out of time when she was trying to explain the modifications. I guess it would have been a fair assumptions that large parts of the audience know such interpreters, so glossing over the details would have been fine. A bit of a pity, since the paper seems interesting.

"Back to the Future in One Week - Implementing a Smalltalk VM in PyPy" by Carl Friedrich Bolz, Adrian Kuhn, Adrian Lienhard, Nicholas D. Matsakis, Oscar Nierstrasz, Lukas Renggli, Armin Rigo and Toon Verwaest, the paper with the longest author list. We just made everybody an author who was at the sprint in Bern. Our paper had more authors than all the other papers together :-). I gave the presentation at the workshop, which went quite well, judging from the feedback I got.

"Huemul - A Smalltalk Implementation" by Guillermo Adrián Molina. Huemul is a Smalltalk implementation that doesn't contain an interpreter but directly compiles all methods to assembler (and also saves the assembler in the image). In addition, as much functionality (such as threading, GUI) as possible is delegated to libraries instead of reimplementing them in Smalltalk (as e.g. Squeak is doing). The approach seems to suffer from the usual problems of manually writing a JIT, e.g. the VM seems to segfault pretty often. Also I don't agree with some of the design decisions of the threading scheme, there is no automatic locking of objects at all, instead the user code is responsible for preventing concurrent accesses from messing up things (which even seems to lead to segfaults in the default image).

"Are Bytecodes an Atavism?" by Theo D'Hondt argued that using AST-based interpreters can be as fast as bytecode-based interpreters which he proved by writing two AST-interpreters, one for Pico and one for Scheme. Both of these implementations seem to perform pretty well. Theo seems to have many similar views as PyPy, for example that writing simple straightforward interpreters is often preferable than writing complex (JIT-)compilers.

Berlin Sprint Finished

The Berlin sprint is finished, below some notes on what we worked on during the last three days:

  • Camillo worked tirelessly on the gameboy emulator with some occasional input by various people. He is making good progress, some test ROMs run now on the translated emulator. However, the graphics are still not completely working for unclear reasons. Since PyBoy is already taken as a project name, we considered calling it PyGirl (another name proposition was "BoyBoy", but the implementation is not circular enough for that).
  • On Monday Armin and Samuele fixed the problem with our multimethods so that the builtin shortcut works again (the builtin shortcut is an optimization that speeds up all operations on builtin non-subclassed types quite a bit).
  • Antonio and Holger (who hasn't been on a sprint in a while, great to have you back!) worked on writing a conftest file (the plugin mechanism of py.test) that would allow us to run Django tests using py.test, which seems to be not completely trivial. They also fixed some bugs in PyPy's Python interpreter, e.g. related to dictionary subclassing.
  • Karl started adding sound support to the RPython SDL-bindings, which will be needed both by the Gameboy emulator and eventually by the SPy VM.
  • Armin and Maciek continued the work that Maciek had started a while ago of improving the speed of PyPy's IO operation. In the past, doing IO usually involved copying lots of memory around, which should have improved now. Armin and Maciek improved and then merged the first of the two branches that contained IO improvements, which speeds up IO on non-moving GCs (mostly the Boehm GC). Then they continued working on the hybrid-io branch which is supposed improve IO on the hybrid GC (which was partially designed exactly for this).
  • Toon, Carl Friedrich finished cleaning up the SPy improvement branch and fixed all warnings that occur when you translate SPy there. An obscure bug in an optimization prevented them from getting working executables, which at this moment blocks the merging of that branch.

By now everybody is home again (except for Anto, who booked his return flight two days too late, accidentally) and mostly resting. It was a good sprint, with some interesting results and several new people joining. And it was definitely the most unusual sprint location ever :-).

Sunday, May 18, 2008

Berlin Sprint Day 1 + 2

After having survived the S3-Workshop which took place in Potsdam on Thursday and Friday (a blog-post about this will follow later) we are now sitting in the c-base in Berlin, happily sprinting. Below are some notes on what progress we made so far:

  • The Gameboy emulator in RPython that Camillo Bruni is working on for his Bachelor project at Uni Bern does now translate. It took him (assisted by various people) a while to figure out the translation errors (essentially because he wrote nice Python code that passed bound methods around, which the RTyper doesn't completely like). Now that is fixed and the Gameboy emulator translates and runs a test ROM. You cannot really see anything yet, because there is no graphics support in RPython.
  • To get graphics support in RPython Armin and Karl started writing SDL bindings for RPython, which both the Gameboy emulator and the SPy VM need. They have basic stuff working, probably enough to support the Gameboy already.
  • Alexander, Armin, Maciek and Samuele discussed how to approach separate compilation for RPython, which isn't easy because the RPython type analysis is a whole-program analysis.
  • Stephan, Peter and Adrian (at least in the beginning) worked on making PyPy's stackless module more complete. They added channel preferences which change details of the scheduling semantics.
  • Toon, Carl Friedrich and Adrian (a tiny bit) worked on SPy. There is a branch that Toon started a while ago which contains many improvements but is also quite unclear in many respects. There was some progress in cleaning that up. This involved implementing the Smalltalk process scheduler (Smalltalk really is an OS). There is still quite some work left though. While doing so, we discovered many funny facts about Squeak's implementation details (most of which are exposed to the user) in the process. I guess we should collect them and blog about them eventually.
  • Samuele and Maciek improved the ctypes version of pysqlite that Gerhard Häring started.
  • Armin, Samuele and Maciek found an obscure bug in the interaction between the builtin-type-shortcut that Armin recently implemented and our multimethod implementation. It's not clear which of the two are to blame, however it seems rather unclear how to fix the problem: Armin and Samuele are stuck in a discussion about how to approach a solution since a while and are hard to talk to.
  • Stijn Timbermont, a Ph.D. student at the Vrije Universiteit Brussel who is visiting the sprint for two days was first looking at how our GCs are implemented to figure out whether he can use PyPy for some experiments. The answer to that seems to be no. Today he was hacking on a Pico interpreter (without knowing too much about Python) and is making some nice progress, it seems.

Will try to blog more as the sprint progresses.

Saturday, May 10, 2008

General performance improvements

Hi all,

During the past two weeks we invested some more efforts on the baseline performance of pypy-c. Some of the tweaks we did were just new ideas, and others were based on actual profiling. The net outcome is that we now expect PyPy to be in the worst case twice as slow than CPython on real applications. Here are some small-to-medium-size benchmark results. The number is the execution time, normalized to 1.0 for CPython 2.4:

  • 1.90 on templess (a simple templating language)
  • 1.49 on gadfly (pure Python SQL database)
  • 1.49 on translate.py (pypy's own translation toolchain)
  • 1.44 on mako (another templating system)
  • 1.21 on pystone
  • 0.78 on richards

(This is all without the JIT, as usual. The JIT is not ready yet.)

You can build yourself a pypy-c with this kind of speed with the magic command line (gcrootfinder is only for a 32-bit Linux machine):

    pypy/translator/goal/translate.py --gc=hybrid --gcrootfinder=asmgcc targetpypystandalone --allworkingmodules --faassen

The main improvements come from:

  • A general shortcut for any operation between built-in objects: for example, a subtraction of two integers or floats now dispatches directly to the integer or float subtraction code, without looking up the '__sub__' in the class.
  • A shortcut for getting attributes out of instances of user classes when the '__getattribute__' special method is not overridden.
  • The so-called Hybrid Garbage Collector is now a three-generations collector. More about our GCs...
  • Some profiling showed bad performance in our implementation of the built-in id() -- a trivial function to write in CPython, but a lot more fun when you have a moving GC and your object's real address can change.
  • The bytecode compiler's parser had a very slow linear search algorithm that we replaced with a dictionary lookup.

These benchmarks are doing CPU-intensive operations. You can expect a similar blog post soon about the I/O performance, as the io-improvements branch gets closer to being merged :-) The branch could also improve the speed of string operations, as used e.g. by the templating systems.

Sunday, May 4, 2008

Next Sprint: Berlin, May 17-22nd May

Our next PyPy sprint will take place in the crashed c-base space station, Berlin, Germany, Earth, Solar System. This is a fully public sprint: newcomers (from all planets) are welcome. Suggestion of topics (other topics are welcome too):

  • work on PyPy's JIT generator: we are refactoring parts of the compiling logic, in ways that may also allow generating better machine code for loops (people or aliens with knowledge on compilers and SSA, welcome)
  • work on the SPy VM, PyPy's Squeak implementation, particularly the graphics capabilities
  • work on PyPy's GameBoy emulator, which also needs graphics support
  • trying some large pure-Python applications or libraries on PyPy and fixing the resulting bugs. Possibilities are Zope 3, Django and others.

For more information, see the full announcement.

Tuesday, April 22, 2008

Google's Summer of Code

PyPy got one proposal accepted for Google's Summer of Code under the Python Software Foundation's umbrella. We welcome Bruno Gola into the PyPy community. He will work on supporting all Python 2.5 features in PyPy and will also update PyPy's standard library to support the modules that were modified or new in Python 2.5.

Right now PyPy supports only Python 2.4 fully (some Python 2.5 features have already sneaked in, though).

Thursday, April 17, 2008

Float operations for JIT

Recently, we taught the JIT x86 backend how to produce code for the x87 floating point coprocessor. This means that JIT is able to nicely speed up float operations (this this is not true for our Python interpreter yet - we did not integrate it yet). This is the first time we started going beyond what is feasible in psyco - it would take a lot of effort to make floats working on top of psyco, way more than it will take on PyPy.

This work is in very early stage and lives on a jit-hotpath branch, which includes all our recent experiments on JIT compiler generation, including tracing JIT experiments and huge JIT refactoring.

Because we don't encode the Python's semantics in our JIT (which is really a JIT generator), it is expected that our Python interpreter with a JIT will become fast "suddenly", when our JIT generator is good enough. If this point is reached, we would also get fast interpreters for Smalltalk or JavaScript with relatively low effort.

Stay tuned.

Cheers,
fijal

Tuesday, April 8, 2008

Wrapping pyrepl in the readline API

If you translate a pypy-c with --allworkingmodules and start it, you will probably not notice anything strange about its prompt - except when typing multiline statements. You can move the cursor up and continue editing previous lines. And the history is multiline-statements-aware as well. Great experience! Ah, and completion using tab is nice too.

Truth be told, there is nothing new here: it was all done by Michael Hudson's pyrepl many years ago. We had already included pyrepl in PyPy some time ago. What is new is a pure Python readline.py which exposes the most important parts of the API of the standard readline module by wrapping pyrepl under the hood, without needing the GNU readline library at all. The PyPy prompt is based on this, benefitting automagically from pyrepl's multiline editing capabilities, with minor tweaks so that the prompt looks much more like CPython's than a regular pyrepl prompt does.

You can also try and use this multiline prompt with CPython: check out pyrepl at http://codespeak.net/svn/pyrepl/trunk/pyrepl and run the new pythoni1 script.

Wednesday, April 2, 2008

Other April's Fools Ideas

While discussing what to post as an April Fool's joke yesterday, we had a couple of other ideas, listed below. Most of them were rejected because they are too incredible, others because they are too close to our wish list.

  • quantum computer backend
  • Perl6 interpreter in RPython
  • Ruby backend to allow run "python on rails"
  • mandatory static typing at app-level, because it's the only way to increase performances
  • rewrite PyPy in Haskell, because we discovered that dynamic typing is just not suitable for a project of this size
  • a C front-end, so that we can interpret the C source of Python C extensions and JIT it. This would work by writing an interpreter for LLVM bytecode in RPython.
  • an elisp backend
  • a TeX backend (use PyPy for your advanced typesetting needs)
  • an SQL JIT backend, pushing remote procedures into the DB engine

Tuesday, April 1, 2008

Trying to get PyPy to run on Python 3.0

As you surely know, Python 3.0 is coming; recently, they released Python 3.0 alpha 3, and the final version is expected around September.

As suggested by the migration guide (in the PEP 3000), we started by applying 2to3 to our standard interpreter, which is written in RPython (though we should call it RPython 2.4 now, as opposed to RPython 3.0 -- see below).

Converting was not seamless, but most of the resulting bugs were due to the new dict views, str/unicode changes and the missing "reduce" built-in. After forking and refactoring both our interpreter and the 2to3 script, the Python interpreter runs on Python 3.0 alpha 3!

Next step was to run 2to3 over the whole translation toolchain, i.e. the part of PyPy which takes care of analyzing the interpreter in order to produce efficient executables; after the good results we got with the standard interpreter, we were confident that it would have been relatively easy to run 2to3 over it: unfortunately, it was not :-(.

After letting 2to3 run for days and days uninterrupted, we decided to kill it: we assume that the toolchain is simply too complex to be converted in a reasonable amount of time.

So, we needed to think something else; THE great idea we had was to turn everything upside-down: if we can't port PyPy to Py3k, we can always port Py3k to PyPy!

Under the hood, the 2to3 conversion tool operates as a graph transformer: it takes the graph of your program (in the form of Python 2.x source file) and returns a transformed graph of the same program (in the form of Python 3.0 source file). Since the entire translation toolchain of PyPy is based on graph transformations, we could reuse it to modify the behaviour of the 2to3 tool. We wrote a general graph-inverter algorithm which, as the name suggests, takes a graph transformation and build the inverse transformation; then, we applied the graph inverter to 2to3, getting something that we called 3to2: it is important to underline that 3to2 was built by automatically analysing 2to3 and reversing its operation with only the help of a few manual hints. For this reason and because we are not keeping generated files under version control, we do not need to maintain this new tool in the Subversion repository.

Once we built 3to2, it was relatively easy to pipe its result to our interpreter, getting something that can run Python 3.0 programs.

Performance-wise, this approach has the problem of being slower at import time, because it needs to run (automatically) 3to2 every time the source is modified; in the future, we plan to apply our JIT techniques also to this part of the interpreter, trying to mitigate the slowdown until it is not noticeable anymore to the final user.

In the next weeks, we will work on the transformation (and probably publish the technique as a research paper, with a title like "Automatic Program Reversion on Intermediate Languages").

UPDATE: In case anybody didn't guess or didn't spot the acronym: The above was an April Fool's joke. Nearly nothing of it is true.

Sunday, March 30, 2008

Py-Lib 0.9.1 released

The Py-Lib 0.9.1 release is out! The Py-Lib is a very important support library that PyPy uses for a lot of things – most importantly it contains py.test, which PyPy uses for testing.

This is mostly a bugfix release, with a couple of new features sneaked in. Most important changes:

  • some new functionality (authentication, export, locking) in py.path's Subversion APIs
  • numerous small fixes in py.test's rsession (experimental pluggable session) and generative test features
  • some fixes in the py.test core

Download/Install: http://codespeak.net/py/0.9.1/download.html

Documentation/API: http://codespeak.net/py/0.9.1/index.html

UPDATE: the py-lib is now easy-installable with:

easy_install py

Friday, March 28, 2008

PyPy Summer of Code Participation

As in the last years, PyPy will again participate in Google's Summer of Code program under the umbrella of the Python Software Foundation. Unfortunately we were a bit disorganized this year, so that our project ideas are only put up now. The list of project ideas of PyPy can be found here.

Any interested student should mail to our mailing list or just come to the #pypy channel on irc.freenode.net to discuss things.

Monday, March 17, 2008

ctypes configuration tool

As a part of implementing ctypes, we decided to make coding using ctypes better on its own (irrelevant what python interpreter you use). The concrete problem we're trying to solve is to make ctypes code more platform-independent than it is. Say you want to create a ctypes type for size_t: ctypes itself provides no mechanism for doing that, so you need to use a concrete integer type (c_int, c_long, c_short etc.). Your code either becomes platform dependent if you pick one of them or is littered with conditionals for all sorts of platforms. We created a small library, called ctypes_configure (which is actually a variation of something we use somewhere in the PyPy source tree), which tries to solve some platform dependencies by compiling and running small chunks of C code through a C compiler. It's sort of like configure in the Linux world, except for Python using ctypes.

To install the library, you can just type easy_install ctypes_configure. The code is in an svn repository on codespeak and there is even some documentation and sample code. Also, even though the code lives in the pypy repository, it depends only on pylib, not on the whole of pypy.

The library is in its early infancy (but we think it is already rather useful). In the future we could add extra features, it might be possible to check whether the argtypes that are attached to the external functions are consistent with what is in the C headers), so that the following code wouldn't segfault but give a nice error
libc = ctypes.CDLL("libc.so")
time = libc.time
time.argtypes = [ctypes.c_double, ctypes.c_double]
time(0.0, 0.0)
Also, we plan to add a way to install a package that uses ctypes_configure in such a way that the installed library doesn't need to call the C compiler any more later.

Bittorrent on PyPy

Hi all,

Bittorrent now runs on PyPy! I tried the no-GUI BitTornado version (btdownloadheadless.py). It behaves correctly and I fixed the last few obvious places which made noticeable pauses. (However we know that there are I/O performance issues left: we make too many internal copies of the data, e.g. in a file.read() or os.read().)

We are interested in people trying out other real-world applications that, like the GUI-less Bittorrent, don't have many external dependencies to C extension modules. Please report all the issues to us!

The current magic command line for creating a pypy-c executable with as many of CPython's modules as possible is:

  cd pypy/translator/goal
  ./translate.py --thread targetpypystandalone.py --allworkingmodules --withmod-_rawffi --faassen

(This gives you a thread-aware pypy-c, which requires the Boehm gc library. The _rawffi module gives you ctypes support but is only tested for Linux at the moment.)

Tuesday, March 4, 2008

As fast as CPython (for carefully taken benchmarks)

Good news everyone. A tuned PyPy compiled to C is nowadays as fast as CPython on the richards benchmark and slightly faster on the gcbench benchmark.
IMPORTANT: These are very carefully taken benchmarks where we expect pypy to be fast! PyPy is still quite slower than CPython on other benchmarks and on real-world applications (but we're working on it). The point of this post is just that for the first time (not counting JIT experiments) we are faster than CPython on *one* example :-)
The exact times as measured on my notebook (which is a Core Duo machine) are here:
Compiled pypy with options:
./translate.py --gcrootfinder=asmgcc --gc=generation targetpypystandalone.py --allworkingmodules --withmod-_rawffi --faassen (allworkingmodules and withmod-_rawffi are very likely irrelevant to those benchmarks)
CPython version 2.5.1, release.
  • richards 800ms pypy-c vs 809ms cpython (1% difference)
  • gcbench 53700ms pypy-c vs 60215ms cpython (11% difference)
PyPy shines on gcbench, which is mostly just about allocating and freeing many objects. Our gc is simply better than refcounting, even though we've got shortcomings in other places.
About richards, there is a catch. We use a method cache optimization, and have an optimization which helps to avoid creating bound methods each time a method is called. This speeds up the benchmark for about 20%. Although method cache was even implemented for CPython, it didn't make its way to the core because some C modules directly modify the dictionary of new-style classes. In PyPy, the greater level of abstraction means that this operation is just illegal.