Monday, August 13, 2012

C++ objects in cppyy, part 1: Data Members

The cppyy module makes it possible to call into C++ from PyPy through the Reflex package. Documentation and setup instructions are available here. Recent work has focused on STL, low-level buffers, and code quality, but also a lot on pythonizations for the CINT backend, which is mostly for High Energy Physics (HEP) use only. A previous posting walked through the high-level structure and organization of the module, where it was argued why it is necessary to write cppyy in RPython and generate bindings at run-time for the best performance. This posting details how access to C++ data structures is provided and is part of a series of 3 postings on C++ object representation in Python: the second posting will be about method dispatching, the third will tie up several odds and ends by showing how the choices presented here and in part 2 work together to make features such as auto-casting possible.

Wrapping Choices

Say we have a plain old data type (POD), which is the simplest possible data structure in C++. Like for example:

    struct A {
        int    m_i;
        double m_d;
    };

What should such a POD look like when represented in Python? Let's start by looking at a Python data structure that is functionally similar, in that it also carries two public data members of the desired types. Something like this:

    class A(object):
        def __init__(self):
            self.m_i = 0
            self.m_d = 0.

Alright, now how to go about connecting this Python class with the former C++ POD? Or rather, how to connect instances of either. The exact memory layout of a Python A instance is up to Python, and likewise the layout of a C++ A instance is up to C++. Both layouts are implementation details of the underlying language, language implementation, language version, and the platform used. It should be no surprise then, that for example an int in C++ looks nothing like a PyIntObject, even though it is perfectly possible, in both cases, to point out in memory where the integer value is. The two representations can thus not make use of the same block of memory internally. However, the requirement is that the access to C++ from Python looks and feels natural in its use, not that the mapping is exact. Another requirement is that we want access to the actual object from both Python and C++. In practice, it is easier to provide natural access to C++ from Python than the other way around, because the choices of memory layout in C++ are far more restrictive: the memory layout defines the access, as the actual class definition is gone at run-time. The best choice then, is that the Python object will act as a proxy to the C++ object, with the actual data always being in C++.

From here it follows that if the m_i data member lives in C++, then Python needs some kind of helper to access it. Conveniently, since version 2.2, Python has a property construct that can take a getter and setter function that are called when the property is used in Python code, and present it to the programmer as if it were a data member. So we arrive at this (note how the property instance is a variable at the class level):

    class A(object):
        def __init__(self):
            self._cppthis = construct_new_A()
        m_i = property(get_m_i, set_m_i)
        m_d = property(get_m_d, set_m_d)

The construct_new_A helper is not very interesting (the reflection layer can provide for it directly), and methods are a subject for part 2 of this posting, so focus on get_m_i and set_m_i. In order for the getter to work, the method needs to have access to the C++ instance for which the Python object is a proxy. On access, Python will call the getter function with the proxy instance for which it is called. The proxy has a _cppthis data member from which the C++ instance can be accessed (think of it as a pointer) and all is good, at least for m_i. The second data member m_d, however, requires some more work: it is located at some offset into _cppthis. This offset can be obtained from the reflection information, which lets the C++ compiler calculate it, so details such as byte padding are fully accounted for. Since the setter also needs the offset, and since both share some more details such as the containing class and type information of the data member, it is natural to create a custom property class. The getter and setter methods then become bound methods of an instance of that custom property, CPPDataMember, and there is one such instance per data member. Think of something along these lines:

    def make_datamember(cppclass, name):
        cppdm = cppyy.CPPDataMember(cppclass, name)
        return property(cppdm.get, cppdm.set)
where the make_datamember function replaces the call to property in the class definition above.

Now hold on a minute! Before it was argued that Python and C++ can not share the same underlying memory structure, because of choices internal to the language. But if on the Python side choices are being made by the developer of the language bindings, that is no longer a limitation. In other words, why not go through e.g. the Python extension API, and do this:

    struct A_pyproxy {
        PyObject_HEAD
        int    m_i;
        double m_d;
    };

Doing so would save on malloc overhead and remove a pointer indirection. There are some technical issues specific to PyPy for such a choice: there is no such thing as PyPyObject_HEAD and the layout of objects is not a given as that is decided only at translation time. But assume that those issues can be solved, and also accept that there is no problem in creating structure definitions like this at run-time, since the reflection layer can provide both the required size and access to the placement new operator (compare e.g. CPython's struct module). There is then still a more fundamental problem: it must be possible to take over ownership in Python from instances created in C++ and vice-versa. With a proxy scheme, that is trivial: just pass the pointer and do the necessary bookkeeping. With an embedded object, however, not every use case can be implemented: e.g. if an object is created in Python, passed to C++, and deleted in C++, it must have been allocated independently. The proxy approach is therefore still the best choice, although embedding objects may provide for optimizations in some use cases.

Inheritance

The next step, is to take a more complicated C++ class, one with inheritance (I'm leaving out details such as constructors etc., for brevity):

    class A {
    public:
        virtual ~A() {}
        int    m_i;
        double m_d;
    };

    class B : public A {
    public:
        virtual ~B() {}
        int    m_j;
    };

From the previous discussion, it should already be clear what this will look like in Python:

    class A(object):
        def __init__(self):
            self._cppthis = construct_new_A()
        m_i = make_datamember('A', 'm_i')
        m_d = make_datamember('A', 'm_d')

    class B(A):
        def __init__(self):
            self._cppthis = construct_new_B()
        m_j = make_datamember('B', 'm_j')

There are some minor adjustments needed, however. For one, the offset of the m_i data member may be no longer zero: it is possible that a virtual function dispatch table (vtable) pointer is added at the beginning of A (an alternative is to have the vtable pointer at the end of the object). But if m_i is handled the same way as m_d, with the offset provided by the compiler, then the compiler will add the bits, if any, for the vtable pointer and all is still fine. A real problem could come in however, with a call of the m_i property on an instance of B: in that case, the _cppthis points to a B instance, whereas the getter/setter pair expect an A instance. In practice, this is usually not a problem: compilers will align A and B and calculate an offset for m_j from the start of A. Still, that is an implementation detail (even though it is one that can be determined at run-time and thus taken advantage of by the JIT), so it can not be relied upon. The m_i getter thus needs to take into account that it can be called with a derived type, and so it needs to add an additional offset. With that modification, the code looks something like this (as you would have guessed, this is getting more and more into pseudo-code territory, although it is conceptually close to the actual implementation in cppyy):

    def get_m_i(self):
        return int(self._cppthis + offset(A, m_i) + offset(self.__class__, A))

Which is a shame, really, because the offset between B and A is going to be zero most of the time in practice, and the JIT can not completely elide the offset calculation (as we will see later; it is easy enough to elide if self.__class__ is A, though). One possible solution is to repeat the properties for each derived class, i.e. to have a get_B_m_i etc., but that looks ugly on the Python side and anyway does not work in all cases: e.g. with multiple inheritance where there are data members with the same name in both bases, or if B itself has a public data member called m_i that shadows the one from A. The optimization then, is achieved by making B in charge of the offset calculations, by making offset a method of B, like so:

    def get_m_i(self):
        return int(self._cppthis + offset(A, m_i) + self.offset(A))

The insight is that by scanning the inheritance hierarchy of a derived class like B, you can know statically whether it may sometimes need offsets, or whether the offsets are always going to be zero. Hence, if the offsets are always zero, the method offset on B will simply return the literal 0 as its implementation, with the JIT taking care of the rest through inlining and constant folding. If the offset could be non-zero, then the method will perform an actual calculation, and it will let the JIT elide the call only if possible.

Multiple Virtual Inheritance

Next up would be multiple inheritance, but that is not very interesting: we already have the offset calculation between the actual and base class, which is all that is needed to resolve any multiple inheritance hierarchy. So, skip that and move on to multiple virtual inheritance. That that is going to be a tad more complicated will be clear if you show the following code snippet to any old C++ hand and see how they respond. Most likely you will be told: "Don't ever do that." But if code can be written, it will be written, and so for the sake of the argument, what would this look like in Python:

    class A {
    public:
        virtual ~A() {}
        int m_a;
    };

    class B : public virtual A {
    public:
        virtual ~B() {}
        int m_b;
    };

    class C : public virtual A {
    public:
        virtual ~C() {}
        int m_c;
    };

    class D : public virtual B, public virtual C {
    public:
        virtual ~D() {}
        int m_d;
    };

Actually, nothing changes from what we have seen so far: the scheme as laid out above is fully sufficient. For example, D would simply look like:

    class D(B, C):
        def __init__(self):
            self._cppthis = construct_new_D()
        m_d = make_datamember('D', 'm_d')

Point being, the only complication added by the multiple virtual inheritance, is that navigation of the C++ instance happens with pointers internal to the instance rather than with offsets. However, it is still a fixed offset from any location to any other location within the instance as its parts are laid out consecutively in memory (this is not a requirement, but it is the most efficient, so it is what is used in practice). But what you can not do, is determine the offset statically: you need a live (i.e. constructed) object for any offset calculations. In Python, everything is always done dynamically, so that is of itself not a limitation. Furthermore, self is already passed to the offset calculation (remember that this was done to put the calculation in the derived class, to optimize the common case of zero offset), thus a live C++ instance is there precisely when it is needed. The call to the offset calculation is hard to elide, since the instance will be passed to a C++ helper and so the most the JIT can do is guard on the instance's memory address, which is likely to change between traces. Instead, explicit caching is needed on the base and derived types, allowing the JIT to elide the lookup in the explicit cache.

Static Data Members and Global Variables

That, so far, covers all access to instance data members. Next up are static data members and global variables. A complication here is that a Python property needs to live on the class in order to work its magic. Otherwise, if you get the property, it will simply return the getter function, and if you set it, it will dissappear. The logical conclusion then, is that a property representing a static or global variable, needs to live on the class of the class, or the metaclass. If done directly though, that would mean that every static data member is available from every class, since all Python classes have the same metaclass, which is class type (and which is its own metaclass). To prevent that from happening and because type is actually immutable, each proxy class needs to have its own custom metaclass. Furthermore, since static data can also be accessed on the instance, the class, too, gets a property object for each static data member. Expressed in code, for a basic C++ class, this looks as follows:

    class A {
    public:
        static int s_i;
    };

Paired with some Python code such as this, needed to expose the static variable both on the class and the instance level:

    meta_A = type(CppClassMeta, 'meta_A', [CPPMetaBase], {})
    meta_A.s_i = make_datamember('A', 's_i')

    class A(object):
        __metaclass__ = meta_A
        s_i = make_datamember('A', 's_i')

Inheritance adds no complications for the access of static data per se, but there is the issue that the metaclasses must follow the same hierarchy as the proxy classes, for the Python method resolution order (MRO) to work. In other words, there are two complete, parallel class hierarchies that map one-to-one: a hierarchy for the proxy classes and one for their metaclasses.

A parallel class hierarchy is used also in other highly dynamic, object-oriented environments, such as for example Smalltalk. In Smalltalk as well, class-level constructs, such as class methods and data members, are defined for the class in the metaclass. A metaclass hierarchy has further uses, such as lazy loading of nested classes and member templates (this would be coded up in the base class of all metaclasses: CPPMetaBase), and makes it possible to distribute these over different reflection libraries. With this in place, you can write Python codes like so:

    >>>> from cppyy.gbl import A
    >>>> a = A()
    >>>> a.s_i = 42
    >>>> print A.s_i == a.s_i
    True
    >>>> # etc.

The implementation of the getter for s_i is a lot easier than for instance data: the static data lives at a fixed, global, address, so no offset calculations are needed. The same is done for global data or global data living in namespaces: namespaces are represented as Python classes, and global data are implemented as properties on them. The need for a metaclass is one of the reasons why it is easier for namespaces to be classes: module objects are too restrictive. And even though namespaces are not modules, you still can, with some limitations, import from them anyway.

It is common that global objects themselves are pointers, and therefore it is allowed that the stored _cppthis is not a pointer to a C++ object, but rather a pointer to a pointer to a C++ object. A double pointer, as it were. This way, if the C++ code updates the global pointer, it will automatically reflect on the Python side in the proxy. Likewise, if on the Python side the pointer gets set to a different variable, it is the pointer that gets updated, and this will be visible on the C++ side. In general, however, the same caveat as for normal Python code applies: in order to set a global object, it needs to be set within the scope of that global object. As an example, consider the following code for a C++ namespace NS with global variable g_a, which behaves the same as Python code for what concerns the visibility of changes to the global variable:

    >>>> from cppyy.gbl import NS, A
    >>>> from NS import g_a
    >>>> g_a = A(42)                     # does NOT update C++ side
    >>>> print NS.g_a.m_i
    13                                   # the old value happens to be 13
    >>>> NS.g_a = A(42)                  # does update C++ side
    >>>> print NS.g_a.m_i
    42
    >>>> # etc.

Conclusion

That covers all there is to know about data member access of C++ classes in Python through a reflection layer! A few final notes: RPython does not support metaclasses, and so the construction of proxy classes (code like make_datamember above) happens in Python code instead. There is an overhead penalty of about 2x over pure RPython code associated with that, due to extra guards that get inserted by the JIT. A factor of 2 sounds like a lot, but the overhead is tiny to begin with, and 2x of tiny is still tiny and it's not easy to measure. The class definition of the custom property, CPPDataMember, is in RPython code, to be transparent to the JIT. The actual offset calculations are in the reflection layer. Having the proxy class creation in Python, with structural code in RPython, complicates matters if proxy classes need to be constructed on-demand. For example, if an instance of an as-of-yet unseen type is returned by a method. Explaining how that is solved is a topic of part 2, method calls, so stay tuned.

This posting laid out the reasoning behind the object representation of C++ objects in Python by cppyy for the purpose of data member access. It explained how the chosen representation of offsets gives rise to a very pythonic representation, which allows Python introspection tools to work as expected. It also explained some of the optimizations done for the benefit of the JIT. Next up are method calls, which will be described in part 2.

3 comments:

Sindwiller said...

On a related note, do you know when Reflex will discard gccxml? I'm using Boost.Python with Ogre3D (among other things) right now and I'm looking into the pypy option. Gccxml, however, complains about some C++11 related stuff (which is somewhat odd, to the least, as I don't expose any Ogre-internal class or anything like that).

Wim Lavrijsen said...

Reflex itself will be discarded in favor of clang from llvm. That is, however, still experimental, but we're getting there.

heemanshu bhalla said...

Complete explanation of static data members with classes and program go to link :-

http://geeksprogrammings.blogspot.in/2013/09/static-data-members.html