As you know, PyPy can emulate the CPython C API to some extent. In this post I will describe an important optimization that we merged to improve the performance and stability of the C-API emulation layer.
The C-API is implemented by passing around PyObject *
pointers in the C code. The problem with providing the same interface with PyPy is that
objects don't natively have the same PyObject *
structure at all; and
additionally their memory address can change. PyPy handles the
difference by maintaining two sets of objects. More precisely, starting
from a PyPy object, it can allocate on demand a PyObject
structure
and fill it with information that points back to the original PyPy
objects; and conversely, starting from a C-level object, it can allocate
a PyPy-level object and fill it with information in the opposite
direction.
I have merged a rewrite of the interaction between C-API C-level objects
and PyPy's interpreter level objects. This is mostly a simplification
based on a small hack in our garbage collector. This hack makes the
garbage collector aware of the reference-counted PyObject
structures. When it considers a pair consisting of a PyPy object and a
PyObject
, it will always free either none or both of them at the
same time. They both stay alive if either there is a regular GC
reference to the PyPy object, or the reference counter in the
PyObject
is bigger than zero.
This gives a more stable result. Previously, a PyPy object might grow a
corresponding PyObject
, loose it (when its reference counter goes to
zero), and later have another corresponding PyObject
re-created at a
different address. Now, once a link is created, it remains alive until
both objects die.
The rewrite significantly simplifies our previous code (which used to be based on at least 4 different dictionaries), and should make using the C-API somewhat faster (though it is still slower than using pure python or cffi).
A side effect of this work is that now PyPy actually supports the upstream lxml package---which is is one
of the most popular packages on PyPI. (Specifically, you need version
3.5.0 with this pull
request to remove old PyPy-specific hacks that were not really
working. See
details.) At this point, we no longer recommend using the
cffi-lxml
alternative: although it may still be faster, it might be
incomplete and old.
We are actively working on extending our C-API support, and hope to soon merge a branch to support more of the C-API functions (some numpy news coming!). Please try it out and let us know how it works for you.
Armin Rigo and the PyPy team