Friday, April 9, 2010

Using CPython extension modules with PyPy natively, or: PyPy can load .pyd files with CPyExt!

PyPy is now able to load and run CPython extension modules (i.e. .pyd and .so files) natively by using the new CPyExt subsystem. Unlike the solution presented in another blog post (where extension modules like numpy etc. were run on CPython and proxied through TCP), this solution does not require a running CPython anymore. We do not achieve full binary compatiblity yet (like Ironclad), but recompiling the extension is generally enough.

The only prerequisite is that the necessary functions of the C API of CPython are already implemented in PyPy. If you are a user or an author of a module and miss certain functions in PyPy, we invite you to implement them. Up until now, a lot of people (including a lot of new committers) have stepped up and implemented a few functions to get their favorite module running. See the end of this post for a list of names.

Regarding speed, we tried the following: even though there is a bit of overhead when running these modules, we could run the regular expression engine of CPython (_sre.so) and execute the spambayes benchmark of the Unladen Swallow benchmark suite (cf. speed.pypy.org) and experience a speedup: It became two times faster on pypy-c than with the built-in regular expression engine of PyPy. From Amdahl's Law it follows that the _sre.so must run several times faster than the built-in engine.

Currently pursued modules include PIL and others. Distutils support is nearly ready. If you would like to participate or want information on how to use this new feature, come and join our IRC channel #pypy on freenode.

Amaury Forgeot d'Arc and Alexander Schremmer

Further CPyExt Contributors:

  • Alex Gaynor
  • Benjamin Peterson
  • Jean-Paul Calderone
  • Maciej Fijalkowski
  • Jan de Mooij
  • Lucian Branescu Mihaila
  • Andreas Stührk
  • Zooko Wilcox-O Hearn

18 comments:

Anonymous said...

Holy crap, this is huge! Is it available in the PPA already? I guess this would put all benchmarks past CPython speed (except for outliers like the euler14 thing).

Anonymous said...

Great news! What is the status of numpy/scipy support?

Alex said...

@Anonymous I don't think anyone has started trying to test numpy or scipy yet, however fundamentally it's just a matter of implementing missing functions. For me starting on numpy in my next goal, after PIL.

Anonymous said...

This is very good news. JIT compiled Python can never fully replace extension modules (existing ones, or the need for new ones), so extension support should be a high priority for the PyPy project. I hope you can eventually get rid of that overhead.

holger krekel said...

wow, just coming back from vacation and have to say: great news and great work, guys! Historically speaking, this is the third approach to the "ext" module issue and if the promise works out as it seems to do, probably the last as far as leveraging cpy ext modules are concerned! I wonder - does it still make sense to have "native" extension modules, the ones we currently have as "mixed" modules?

Anonymous said...

Let me ask for a bit more detail. I depend on a module (http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html), that is currently unsupported, as far as I know. I'd really like to port it to pypy. Where to start?

Is it possible that the module runs without modifications? Can I check this simply by building a pypy-trunk, and write "import cmaxent"?

Bartosz SKOWRON said...

@Anonymous: No it's not in the PPA. We provide only the latest release (1.2 in this case) and weekly builds for trunk (which haven't been announced on the blog yet). CPython extension modules live in their own branch. The branch will be merged into the trunk sooner or later.

PS. The weekly builds are available here at https://launchpad.net/~pypy

Alexander Schremmer said...

@Anonymous

To test your module, you need to compile and load it. For compilation, you can use a compiled pypy binary and run setup.py build_ext with your setup file. For hints about manual compilation and module loading, visit our IRC channel.

Alexander Schremmer said...

@holger

MixedModules allow you to implement modules in RPython (using the PyPy API) and Python at the same time. CPyExt is for modules written in C using the CPython API. So both solutions are for different needs.

horace said...

what about embedding pypy? will this work too in the future?

the reason i ask is blender. there were some security concerns among blender developers recently. blender uses embedded cpython for scripting. normal scripts (like exporters) which have to be evoked by the user aren't that much of a problem but blender also supports python expressions for animation parameters. without a sandbox downloading and opening .blend files from unknown sources is kind of risky since a malicious python expression theoretically could wipe your harddisk.

pypy with its support for a sandbox could be a very good replacement for cpython in blender (also because of its speed) but if it isn't compatible with the cpython api then a swap probably would be way too much effort.

Alexander Schremmer said...

@horace
what about embedding pypy?

That should work as easy as extending.

holger krekel said...

@alexander True, mixed modules are for rpython-implemented modules and need to be translated together with the pypy interpreter and could make use of the JIT. My question more aimed at the issue for which use cases / goals which kind of extension module mechanism makes sense.
IOW, some discussion and web page regarding rpy-ext/ctypes/cpy-ext would make sense, i guess. Or is it somewhere already?

Alexander Schremmer said...

@holger
some discussion and web page regarding rpy-ext/ctypes/cpy-ext would make sense

Yes, someone could write down guidelines. Using the C API runs your module fast in case of CPython. A bit slower on ironpython and PyPy.

Using ctypes gives your module access to these three interpreters as well, but it will run slower. One advantage here is that you do not need to write C to create a wrapper around a library. If your objective is speed and lower memory usage, then CTypes does not work either.

Mixed modules make your module work only on PyPy and provide a decent speed and a mixture of a decent (Python) and a bit harder to grasp (RPython) programming language. This only makes sense as a platform if your users are also using PyPy.

illume said...

Super awesome! Can't wait to get home and try it out.

Gary Robinson said...

It's a few months later, and I'm wondering what progress has been made. Early comments mentioned that nobody had tried numpy or scipy yet -- has that changed?

Also, does this make the multiprocessing library available? Or, is pp (parallel processing) available?

I'm very excited about PyPy because of the JIT. But for my work I also need some form of utilizing multiple CPU's. Right now I'm using unladen swallow with the multiprocessing module.

Anonymous said...

Yup, I'd love to hear about the progress on this.

Anonymous said...

Any chance this will be released sometime?

Alexander Schremmer said...

It was already released, just check out the current PyPy release.