Wednesday, January 6, 2016

Using CFFI for embedding

Introduction

CFFI has been a great success so far to call C libraries in your Python programs, in a way that is both simple and that works across CPython 2.x and 3.x and PyPy.

This post assumes that you know what CFFI is and how to use it in API mode (ffi.cdef(), ffi.set_source(), ffi.compile()). A quick overview can be found in this paragraph.

The major news of CFFI 1.4, released last december, was that you can now declare C functions with extern "Python" in the cdef(). These magic keywords make the function callable from C (where it is defined automatically), but calling it will call some Python code (which you attach with the @ffi.def_extern() decorator). This is useful because it gives a more straightforward, faster and libffi-independent way to write callbacks. For more details, see the documentation.

You are, in effect, declaring a static family of C functions which call Python code. The idea is to take pointers to them, and pass them around to other C functions, as callbacks. However, the idea of a set of C functions which call Python code opens another path: embedding Python code inside non-Python programs.

Embedding

Embedding is traditionally done using the CPython C API: from C code, you call Py_Initialize() and then some other functions like PyRun_SimpleString(). In the simple cases it is, indeed, simple enough; but it can become a complicated story if you throw in supporting application-dependent object types; and a messy story if you add correctly running on multiple threads, for example.

Moreover, this approach is specific to CPython (2.x or 3.x). It does not work at all on PyPy, which has its own very different, minimal embedding API.

The new-and-coming thing about CFFI 1.5, meant as replacement of the above solutions, is direct embedding support---with no fixed API at all. The idea is to write some Python script with a cdef() which declares a number of extern "Python" functions. When running the script, it creates the C source code and compiles it to a dynamically-linked library (.so on Linux). This is the same as in the regular API-mode usage. What is new is that these extern "Python" can now also be exported from the .so, in the C sense. You also give a bit of initialization-time Python code directly in the script, which will be compiled into the .so too.

This library can now be used directly from any C program (and it is still importable in Python). It exposes the C API of your choice, which you specified with the extern "Python" declarations. You can use it to make whatever custom API makes sense in your particular case. You can even directly make a "plug-in" for any program that supports them, just by exporting the API expected for such plugins.

Trying it out on CPython

This is still being finalized, but please try it out. You can see embedding.py directly online for a quick glance. Or see below the instructions on Linux with CPython 2.7 (CPython 3.x and non-Linux platforms are still a work in progress right now, but this should be quickly fixed):

  • get the branch static-callback-embedding of CFFI:

    hg clone https://bitbucket.org/cffi/cffi
    hg up static-callback-embedding
    
  • make the _cffi_backend.so:

    python setup_base.py build_ext -f -i
    
  • run embedding.py in the demo directory:

    cd demo
    PYTHONPATH=.. python embedding.py
    
  • this produces _embedding_cffi.c. Run gcc to build it. On Linux:

    gcc -shared -fPIC _embedding_cffi.c -o _embedding_cffi.so  \
        -lpython2.7 -I/usr/include/python2.7
    
  • try out the demo C program in embedding_test.c:

    gcc embedding_test.c _embedding_cffi.so
    PYTHONPATH=.. LD_LIBRARY_PATH=. ./a.out
    

Note that if you get ImportError: cffi extension module '_embedding_cffi' has unknown version 0x2701, it means that the _cffi_backend module loaded is a pre-installed one instead of the more recent one in "..". Be sure to use PYTHONPATH=.. for now. (Some installations manage to be confused enough to load the system-wide cffi even if another version is in the PYTHONPATH. I think a virtualenv can be used to work around this issue.)

Try it out on PyPy

Very similar steps can be followed on PyPy, but it requires the cffi-static-callback-embedding branch of PyPy, which you must first translate from sources. The difference is then that you need to adapt the first gcc command line: replace -lpython2.7 with -lpypy-c and to fix the -I path (and possibly add a -L path).

More details

How it works, more precisely, is by automatically initializing CPython/PyPy the first time any of the extern "Python" functions is called from the C program. This is done using locks in case of multi-threading, so several threads can concurrently do this "first call". This should work even if two different threads call the first time a function from two different embedded CFFI extensions that happen to be linked with the same program. Explicit initialization is never needed.

The custom initialization-time Python code you put in ffi.embedding_init_code() is executed at that time. If this code starts to be big, you can move it to independent modules or packages. Then the initialization-time Python code only needs to import them. In that case, you have to carefully set up sys.path if the modules are not installed in the usual Python way.

If the Python code is big and full of dependencies, a better alternative would be to use virtualenv. How to do that is not fully fleshed out so far. You can certainly run the whole program with the environment variables set up by the virtualenv's activate script first. There are probably other solutions that involve using gcc's -Wl,-rpath=\$ORIGIN/ or -Wl,-rpath=/fixed/path/ options to load a specific libpython or libypypy-c library. If you try it out and it doesn't work the way you would like, please complain :-)

Another point: right now this does not support CPython's notion of multiple subinterpreters. The logic creates a single global Python interpreter, and runs everything in that context. Maybe a future version would have an explicit API to do that — or maybe it should be the job of a 3rd-party extension module to provide a Python interface over the notion of subinterpreters...

More generally, any feedback is appreciated.

Have fun,

Armin

7 comments:

  1. Thanks to apollo13 on irc for early feedback. Main change: the code put in embedded_init_code() should now start with "from xx import ffi", where "xx" is the name of the module (first argument to set_source()). The goal is to clearly say that you need the same line in other modules imported from there.

    ReplyDelete
  2. This is very exciting! Just waiting for Python 3.x support now. :)

    ReplyDelete
  3. Python 3 is implemented and tested now.

    ReplyDelete
  4. Windows support is now done (tested on Python 2.7). Expect a release soon :-)

    ReplyDelete
  5. Excelent feature!!

    CFFI rocks, and the documentation keeps improving :)

    ReplyDelete
  6. Awesome, pypyInstaller in cross-hairs!

    ReplyDelete

See also PyPy's IRC channel: #pypy at freenode.net, or the pypy-dev mailing list.
If the blog post is old, it is pointless to ask questions here about it---you're unlikely to get an answer.