Python's garbage collection semantics is very much historically grown and
implementation-driven. Samuele Pedroni therefore likes to call it the "'there
is no such thing as too much chocolate'-approach to GC semantics" :-). In this
two-part post series I am going to talk about the semantics of finalization
(__del__ methods) in CPython and PyPy.
The current behaviour is mostly all a consequence of the fact that CPython uses
reference counting for garbage collection. The first consequence is that if
several objects die at the same time, their finalizers are called in a
so-called topological order, which is a feature that some GCs have that
CPython offers by chance. This ensures, that in a __del__ method, all the
attributes of the object didn't get their __del__ called yet. A simple
example:
class B(object):
def __init__(self, logfile):
self.logfile = logfile
def __del__(self):
self.logfile.write("done doing stuff")
b = B(file("logfile.txt", "w"))
If the instance of B dies now, both it and the logfile are dead. They will
get their __del__``s called and it's important that the file's ``__del__
gets called second, because otherwise the __del__ of B would try to
write to a closed file.
The correct ordering happens completely automatically if you use reference
counting: Setting b to None will decref the old value of b. This reduces
the reference count of this instance to 0, so the finalizer will be called.
After the __del__ has finished, this object will be freed and all the
objects it points to decrefed as well, which decreases the reference count of
the file to 0 and call its `` __del__`` as well, which closes the file.
The behaviour of PyPy's semispace and generational GCs wasn't very nice so far:
it just called the finalizers in an essentially random order. Last week Armin
came up with a somewhat complicated algorithm that solves this by emulating
CPython's finalization order, which we subsequently implemented. So PyPy does
what you expect now! The Boehm GC does a topological ordering by default, so it
wasn't a problem there.
A small twist on the above is when
there is a cycle of objects involving finalizers:
In this case a topological ordering is not possible, so that CPython refuses to
guess the finalization order and puts such cycles into gc.garbage. This
would be very hard for PyPy to do, since our GC implementation is essentially
independent from the Python interpreter. The same GCs work for our other
interpreters after all too. Therefore we decided to break such a cycle at an
arbitrary place, which doesn't sound too insane. The insane thing is for
a Python program to create a cycle of objects with finalizers and depend
on the order in which the finalizers are called. Don't do that :-) (After
all, CPython wouldn't even call the finalizers in this case.)