Friday, July 9, 2010

CERN Sprint Report – Wrapping C++ Libraries

The last five days we have been sprinting in a meeting room in the Computing Center at CERN in Genève, Switzerland. Present are Armin Rigo, Antonio Cuni, Carl Friedrich Bolz and Wim Lavrijsen (LBL). The goal of the sprint was to use some of the C++ technology developed at CERN to make it possible to use C++ libraries from PyPy's Python interpreter. For this we used the Reflex library, which provides reflection information for C++ classes. We discussed using Reflex in PyPy during the Düsseldorf sprint of 2008, please read that blog post if you want some more details on how Reflex works. There is support for this sort of C++/Python integration also for CPython, using the PyROOT module.

The sprint was very successful. On Monday we had a few discussion about how Reflex could best be integrated with PyPy. One of the goals of the sprint was to make the approach JIT-friendly from the start, so that calls to C++ libraries can be reasonably fast. After the discussion we started coding on the reflex-support branch. This branch adds a new cppyy builtin module to PyPy's Python interpreter (why we chose that name is left as an exercise to the reader). This module can be used to load C++ classes, construct instances and call static and instance methods on them.

The work has just started, as of now, the argument and return types of the methods are restricted to some simple C types, such as int, double and char* and pointers to class instances. Most of the work necessary to properly resolve overloaded methods is done, but default arguments are not.

As an example, suppose there is a C++ class like this:

class example01 {
private:
    static int count;
    int somedata;
public:

    example01(int a) : somedata(a) {
        count++;
    }
    ~example01() {
        count--;
    }
    static int getCount() {
        return count;
    }

    int addDataToInt(int a) {
        return somedata + a;
    }
};
int example01::count = 0;

You can now use it from PyPy's Python interpreter in the following way, after you have used Reflex to generate reflection information for the class:

import cppyy
cppyy.load_lib("example01Dict.so") # contains the Reflex information
example01_class = cppyy.gbl.example01
instance = example01_class(7)
assert example01_class.getCount() == 1
res = instance.addDataToInt(4)
assert res == 11
res = instance.addDataToInt(-4)
assert res == 3
instance.destruct() # so far explicit destruction needed
assert example01_class.getCount() == 0

We also did some very early JIT work and some early performance measurements. The rough figures are that cppyy is two times faster at calling a simple C++ method from Python than PyROOT. To get a feeling for how fast things could go in the end, we also implemented a proof-of-concept for some more advanced JIT technology (which requires a patch for Reflex and uses a GCC extension). With this, the speedup over PyROOT is a factor of 20. Of course, this is still a lot slower than a C++ to C++ method call (probably by at least an order of magnitude).

The sprint was very productive because we managed to get the right people into the same room working together. Wim has a lot of experience with C++ and Reflex, and is the author of PyROOT, and of course the others know a lot about PyPy (at the end of the sprint, Anto was very glad that he stopped using C++ a long time ago). Also, working at CERN was very cool. The atmosphere is amazing, and we got to visit the ATLAS control room. Extremely advanced technology, and also research on a completely different scale than what we are used to.

No comments: