Wednesday, February 20, 2008

Running Pyglet on PyPy

As part of our efforts of making PyPy's Python interpreter usable we put quite some effort into interfacing with external libraries. We were able, in quite a short amount of time (I think beginning really from Leysin sprint, or slightly earlier) to provide a prototype of the ctypes library. It is written in completely normal Python, at applevel, based on a very thin wrapper around the libffi library. This makes development a lot easier, but it makes the resulting ctypes implementation rather slow. The implementation is not complete yet and it will still need quite some effort to make it feature-complete (ctypes has lots of details and special cases and do-what-I-mean magic). Yet another point will be to make it faster, but that's for much later.
The implementation is good enough to run those parts of Pyglet that don't depend on PIL (which PyPy doesn't have). Here are a few pictures of running Pyglet demos on top of compiled pypy-c. To compile a version of PyPy that supports ctypes, use this highly sophisticated command line
./ --gc=generation ./ --allworkingmodules --withmod-_rawffi
Note: this works on linux only right now.
The list of missing small ctypes features is quite extensive, but I consider the current implementation to be usable for most common cases. I would love to hear about libraries written in pure python (using ctypes), to run them on top of PyPy and use them as test cases. If someone knows such library, please provide a link.

Python Finalizers Semantics, Part 2: Resurrection

Continuing the last blog post about GC semantics in Python.

Another consequence of reference counting is that resurrection is easy to detect. A dead object can resurrect itself if its finalizer stores it into a globally reachable position, like this:

class C(object):
    def __init__(self, num):
        self.num = num
    def __del__(self):
        global c
        if c is None:
            c = self
c = C(1)
while c is not None:
    c = None
    print "again"

This is an infinite loop in CPython: Every time c is set to None in the loop, the __del__ method resets it to the C instance again (note that this is terribly bad programming style, of course. In case anybody was wondering :-)). CPython can detect resurrection by checking whether the reference count after the call to __del__ has gotten bigger.

There exist even worse examples of perpetual resurrection in particular in combination with the cycle GC. If you want to see a particularly horrible one, see this discussion started by Armin Rigo. In the ensuing thread Tim Peters proposes to follow Java's example and call the finalizer of every object at most once.

In PyPy the resurrection problem is slightly more complex, since we have GCs that run collection from time to time and don't really get to know at which precise time an object dies. If the GC discovers during a collection that an object is dead, it will call the finalizer after the collection is finished. If the object is then dead at the next collection, the GC does not know whether the object was resurrected by the finalizer and then died in the meantime or whether it was not resurrected. Therefore it seemed sanest to follow Tim's solution and to never call the finalizer of an object a second time, which has many other benefits as well.

Monday, February 18, 2008

Python Finalizers Semantics, Part 1

Python's garbage collection semantics is very much historically grown and implementation-driven. Samuele Pedroni therefore likes to call it the "'there is no such thing as too much chocolate'-approach to GC semantics" :-). In this two-part post series I am going to talk about the semantics of finalization (__del__ methods) in CPython and PyPy.

The current behaviour is mostly all a consequence of the fact that CPython uses reference counting for garbage collection. The first consequence is that if several objects die at the same time, their finalizers are called in a so-called topological order, which is a feature that some GCs have that CPython offers by chance. This ensures, that in a __del__ method, all the attributes of the object didn't get their __del__ called yet. A simple example:

class B(object):
    def __init__(self, logfile):
        self.logfile = logfile
    def __del__(self):
        self.logfile.write("done doing stuff")
b = B(file("logfile.txt", "w"))

If the instance of B dies now, both it and the logfile are dead. They will get their __del__``s called and it's important that the file's ``__del__ gets called second, because otherwise the __del__ of B would try to write to a closed file.

The correct ordering happens completely automatically if you use reference counting: Setting b to None will decref the old value of b. This reduces the reference count of this instance to 0, so the finalizer will be called. After the __del__ has finished, this object will be freed and all the objects it points to decrefed as well, which decreases the reference count of the file to 0 and call its `` __del__`` as well, which closes the file.

The behaviour of PyPy's semispace and generational GCs wasn't very nice so far: it just called the finalizers in an essentially random order. Last week Armin came up with a somewhat complicated algorithm that solves this by emulating CPython's finalization order, which we subsequently implemented. So PyPy does what you expect now! The Boehm GC does a topological ordering by default, so it wasn't a problem there.

A small twist on the above is when there is a cycle of objects involving finalizers: In this case a topological ordering is not possible, so that CPython refuses to guess the finalization order and puts such cycles into gc.garbage. This would be very hard for PyPy to do, since our GC implementation is essentially independent from the Python interpreter. The same GCs work for our other interpreters after all too. Therefore we decided to break such a cycle at an arbitrary place, which doesn't sound too insane. The insane thing is for a Python program to create a cycle of objects with finalizers and depend on the order in which the finalizers are called. Don't do that :-) (After all, CPython wouldn't even call the finalizers in this case.)

Tuesday, February 12, 2008

PyPy presence on various conferences in the near future

Hello! I will have the pleasure of presenting PyPy on various conferences in the near future. They're (in chronological order):
  • Studencki Festiwal Informatyczny in Krakow, POLAND 6-8 March 2008. I think this might be only interesting for polish people (website, in polish)
  • Pycon Chicago, IL, USA. 14-17 March 2008. There should be also a PyPy sprint afterwards, including newbie-friendly tutorial, everybody is welcome to join us! (Provided that I'll get the US visa, which seems to be non-trivial issue for a polish citizen)
  • RuPy, Poznan, POLAND 13-14 April 2008 (website). This is small, but very friendly Ruby and Python conference. Last year was amazing, I can strongly recommend to go there (Poznan is only 2h by train from Berlin also has its own airport).
Hope to see you at those places!