Thursday, February 25, 2016

C-API Support update

As you know, PyPy can emulate the CPython C API to some extent. In this post I will describe an important optimization that we merged to improve the performance and stability of the C-API emulation layer.

The C-API is implemented by passing around PyObject * pointers in the C code. The problem with providing the same interface with PyPy is that objects don't natively have the same PyObject * structure at all; and additionally their memory address can change. PyPy handles the difference by maintaining two sets of objects. More precisely, starting from a PyPy object, it can allocate on demand a PyObject structure and fill it with information that points back to the original PyPy objects; and conversely, starting from a C-level object, it can allocate a PyPy-level object and fill it with information in the opposite direction.

I have merged a rewrite of the interaction between C-API C-level objects and PyPy's interpreter level objects. This is mostly a simplification based on a small hack in our garbage collector. This hack makes the garbage collector aware of the reference-counted PyObject structures. When it considers a pair consisting of a PyPy object and a PyObject, it will always free either none or both of them at the same time. They both stay alive if either there is a regular GC reference to the PyPy object, or the reference counter in the PyObject is bigger than zero.

This gives a more stable result. Previously, a PyPy object might grow a corresponding PyObject, loose it (when its reference counter goes to zero), and later have another corresponding PyObject re-created at a different address. Now, once a link is created, it remains alive until both objects die.

The rewrite significantly simplifies our previous code (which used to be based on at least 4 different dictionaries), and should make using the C-API somewhat faster (though it is still slower than using pure python or cffi).

A side effect of this work is that now PyPy actually supports the upstream lxml package---which is is one of the most popular packages on PyPI. (Specifically, you need version 3.5.0 with this pull request to remove old PyPy-specific hacks that were not really working. See details.) At this point, we no longer recommend using the cffi-lxml alternative: although it may still be faster, it might be incomplete and old.

We are actively working on extending our C-API support, and hope to soon merge a branch to support more of the C-API functions (some numpy news coming!). Please try it out and let us know how it works for you.

Armin Rigo and the PyPy team


mathgl said...

wow, s good news. When trying to pick up a new lib, I always check whether it supports pypy first.

Anonymous said...

Really looking forward to hearing news from the numpy front!

Esteban Echeverry said...

Great. Maybe now Odoo will work with PyPy!

Anonymous said...

Great, in particular the native lxml. This is used in many large production systems that will now be even more interested in PyPy.