Thursday, February 28, 2013

cppyy status update

The cppyy module provides C++ bindings for PyPy by using the reflection information extracted from C++ header files by means of the Reflex package. In order to support C++11, the goal is to move away from Reflex and instead use cling, an interactive C++ interpreter, as the backend. Cling is based on llvm's clang. The use of a real compiler under the hood has the advantage that it is now possible to cover every conceivable corner case. The disadvantage, however, is that every corner case actually has to be covered. Life is somewhat easier when calls come in from the python interpreter, as those calls have already been vetted for syntax errors and all lookups are well scoped. Furthermore, the real hard work of getting sane responses from and for C++ in an interactive environment is done in cling, not in the bindings. Nevertheless, it is proving a long road (but for that matter clang does not support all of C++11 yet), so here's a quick status update showing that good progress is being made.

The following example is on CPython, not PyPy, but moving a third (after Reflex and CINT) backend into place underneath cppyy is straightforward compared to developing the backend in the first place. Take this snippet of C++11 code (cpp11.C):

    constexpr int data_size() { return 5; }

    auto N = data_size();

    template<class L, class R>
    struct MyMath {
       static auto add(L l, R r) -> decltype(l+r) { return l + r; }
    };

    template class MyMath<int, int>;

As a practical matter, most usage of new C++11 features will live in implementations, not in declarations, and are thus never seen by the bindings. The above example is therefore somewhat contrived, but it will serve to show that these new declarations actually work. The new features used here are constexpr, auto, and decltype. Here is how you could use these from CPython, using the PyROOT package, which has more than a passing resemblance to cppyy, as one is based on the other:

    import ROOT as gbl
    gbl.gROOT.LoadMacro('cpp11.C')

    print 'N =', gbl.N
    print '1+1 =', gbl.MyMath(int, int).add(1,1)
which, when entered into a file (cpp11.py) and executed, prints the expected results:

    $ python cpp11.py
    N = 5
    1+1 = 2
In the example, the C++ code is compiled on-the-fly, rather than first generating a dictionary as is needed with Reflex. A deployment model that utilizes stored pre-compiled information is foreseen to work with larger projects, which may have to pull in headers from many places.

Work is going to continue first on C++03 on cling with CPython (about 85% of unit tests currently pass), with a bit of work on C++11 support on the side. Once fully in place, it can be brought into a new backend for cppyy, after which the remaining parts of C++11 can be fleshed out for both interpreters.

Cheers,
Wim Lavrijsen

2 comments:

Anonymous said...

How would memory management work for C++ objects which own PyPy objects? In CPython, or any similar reference counting system, a C++ class can hold only references via special smart pointers. These smart pointers don't need to be registered in any way with the outer class, since there's no need for a garbage collector to traverse from the outer object to the inner smart pointer instances.

For decent garbage collection to work, presumably one needs to be able to enumerate the PyPy objects pointed to by a C++ object. How would this work?

Wim Lavrijsen said...

Right now, there are no PyPy objects exposed as such, but only PyObjects through cpyext in support of the python C-API. In cppyy, cpyext is used for any interface that has a PyObject* as argument or return value. It is cpyext that takes care of marrying the ref-count API with the garbage collector.

Don't pin me down on the details, but from what I understand of cpyext, a wrapper object with the proper C layout is created, and given a life line by putting it in an internal container holding all such objects safe from the gc simply by existing. When the ref count hits zero, the life line gets removed. Object identity is preserved by finding objects in the internal container and reusing them.