Wednesday, March 26, 2014

pygame_cffi: pygame on PyPy

The Raspberry Pi aims to be a low-cost educational tool that anyone can use to learn about electronics and programming. Python and pygame are included in the Pi's programming toolkit. And since last year, thanks in part to sponsorship from the Raspberry Pi Foundation, PyPy also works on the Pi (read more here).

With PyPy working on the Pi, game logic written in Python stands to gain an awesome performance boost. However, the original pygame is a Python C extension. This means it performs poorly on PyPy and negates any speedup in the Python parts of the game code.

One solution to making pygame games run faster on PyPy, and eventually on the Raspberry Pi, comes in the form of pygame_cffi. pygame_cffi uses CFFI to wrap the underlying SDL library instead of a C extension. A few months ago, the Raspberry Pi Foundation sponsored a Cape Town Python User Group hackathon to build a proof-of-concept pygame using CFFI. This hackathon was a success and it produced an early working version of pygame_cffi.

So for the last 5 weeks Raspberry Pi has been funding work on pygame_cffi. The goal was a complete implementation of the core modules. We also wanted benchmarks to illuminate performance differences between pygame_cffi on PyPy and pygame on CPython. We are happy to report that those goals were met. So without further ado, here's a rundown of what works.

Current functionality

Invention screenshot:
Mutable mamba screenshot:

With the above-mentioned functionality in place we could get 10+ of the pygame examples to work, and a number of PyWeek games. At the time of writing, if a game doesn't work it is most likely due to an unimplemented transform or draw function. That will be remedied soon.

Performance

In terms of performance, pygame_cffi on PyPy is showing a lot of promise. It beats pygame on CPython by a significant margin in our events processing and collision detection benchmarks, while blit and fill benchmarks perform similarly. The pygame examples we checked also perform better.

However, there is still work to be done to identify and eliminate bottlenecks. On the Raspberry Pi performance is markedly worse compared to pygame (barring collision detection). The PyWeek games we tested also performed slightly worse. Fortunately there is room for improvement in various places.

Invention & Mutable Mamba (x86)
Standard pygame examples (Raspberry Pi)

Here's a summary of some of the benchmarks. Relative speed refers to the frame rate obtained in pygame_cffi on PyPy relative to pygame on CPython.

Benchmark Relative speed (pypy speedup)
Events (x86) 1.41
Events (Pi) 0.58
N2 collision detection on 100 sprites (x86) 4.14
N2 collision detection on 100 sprites (Pi) 1.01
Blit 100 surfaces (x86) 1.06
Blit 100 surfaces (Pi) 0.60
Invention (x86) 0.95
Mutable Mamba (x86) 0.72
stars example (x86) 1.95
stars example (Pi) 0.84

OpenGL

Some not-so-great news is that PyOpenGL performs poorly on PyPy since PyOpenGL uses ctypes. This translates into a nasty reduction in frame rate for games that use OpenGL surfaces. It might be worthwhile creating a CFFI-powered version of PyOpenGL as well.

Where to now?

Work on pygame_cffi is ongoing. Here are some things that are in the pipeline:

  • Get pygame_cffi on PyPy to a place where it is consistently faster than pygame on CPython.
  • Implement the remaining modules and functions, starting with draw and transform.
  • Improve test coverage.
  • Reduce the time it takes for CFFI to parse the cdef. This makes the initial pygame import slow.

If you want to contribute you can find pygame_cffi on Github. Feel free to find us on #pypy on freenode or post issues on github.

Cheers,
Rizmari Versfeld


41 comments:

  1. Pygame should be an excellent way to benchmark the performance of pypy, so this is great! I wanted to let you fellas know of another project that is using pypy that looks really neat as well... https://github.com/rfk/pypyjs

    ReplyDelete
  2. pygame seems outdated, because it is based on first SDL version.

    It will be interesting to see CFFI comparison for newer, SDL2 bindings, such as PySDL2, which is ctypes based at the moment.

    https://pypi.python.org/pypi/PySDL2

    ReplyDelete
  3. Anatoly, pygame is outdated but have no clear replacement. PySDL2 is nice, but it's only a low level binding, it does not really help in the case of writing games.

    ReplyDelete
  4. Is it not wrapping the current SDL? I thought that it was... On github it says it's a pygame based wrapper(copies the api) for SDL, would that not make it the current SDL?

    ReplyDelete
  5. I looked into PyOpenGL's code to see if there is an easy way to upgrade to CFFI.

    It's a bag of cats EVERYWHERE.

    ctypes are defined all over the place, unlike most ctypes->cffi projects, where there is a single source file (api.py) that is easy to convert due to it being the raw interface to the C library.

    ReplyDelete
  6. @Maciej, pygame includes a lot of helpers and good documentation, but it is not perspective technology to play with. I'd say there are more interesting libs out there that gain more interesting results and speeding up dynamic binding for them would be very cool to make things like these - https://devart.withgoogle.com/ - possible.


    @Anonymous, if I were to provide OpenGL bindings, I'd start with looking at https://github.com/p3/regal project and binding generator in scripts/

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. I've actually been working to see if I can get my own Pygame release, Sky Eraser, optimised enough to work on a Raspberry Pi -- it'd be worth seeing how implementing it under this configuration would work on top of the optimisations I've been working on in the background (boy are there a lot to make).

    I might also be rewriting the APIs for Allegro 5.1 as an experiment though, to test under both CPython and PyPy.

    ReplyDelete
  9. I started to work on a newer and experimental OpenGL wrapper for Python, proudly blessed PyOpenGLng.

    In comparison to PyOpenGL, it generates the requested OpenGL API from the OpenGL XML Registry and use an automatic translator to map the C API to Python. The translator is quite light weight in comparison to PyOpenGL source code. And it is already able to run a couple of examples for OpenGL V3 and V4.

    Actually the wrapper use ctypes. But I am looking for tips to do the same for cffi, as well as feedbacks on performance and comments.

    The project is hosted on https://github.com/FabriceSalvaire/PyOpenGLng.

    ReplyDelete
  10. @Fabrice, how is your newer and experimental OpenGL wrapper generator is better than existing ones? I am not saying that there is a NIH effect - probably some omission from documentation.

    ReplyDelete
  11. I mean that if PyOpenGL doesn't use wrapper generator then there are a couple around not limiting themselves to Python. I am especially interested to know the comparison with regal.

    ReplyDelete
  12. It was my impression that OpenGL isn't hardware accelerated on the pi anyway... or am I incorrect?

    ReplyDelete
  13. @anatoly: The only real replacement for pygame which I know is pyglet. It is not quite as game-optimized as pygame, but very versatile and a joy to use.

    http://pyglet.org

    ReplyDelete
  14. I've actually made a CFFI OpenGL binding, as part of my successor to my old PyGL3Display project. It's not hosted anywhere yet, but I'll see about getting up somewhere soon.

    ReplyDelete
  15. And... done. A mostly drop-in replacement for PyOpenGL on CFFI, or at least for OpenGL 3.2 core spec.

    https://www.dropbox.com/s/rd44asge17xjbn2/gl32.zip

    ReplyDelete
  16. @Arne, pyglet rocks, because it is just `clone and run` unlike all other engines. But it looks a little outdated, that's why I started to look for alternatives.

    ReplyDelete
  17. @David, if you want people to comment on this, Bitbucket would be a better way to share sources than Dropbox.

    ReplyDelete
  18. @anatoly techtonick:
    Actually, it'll end up on Launchpad in the near future (probably within 2 weeks?). However, it's the output of a wrapper generator and the wrapper generator is in pretty poor shape at the moment, in terms of packaging it's output. I just figured people might be able to use it in the near future, even if it is in 'source-code-dump' form. If there's a better temporary home for it somewhere, I'm all ears.

    ReplyDelete
  19. @David, why reinvent the wheel? There are many wrapper generators around. Also, you project is not a replacement for PyOpenGL, because of GPL restrictions.

    ReplyDelete
  20. @anatoly

    I never claimed my project is a replacement for PyOpenGL - it's not API compatible, for a start. Regarding license, it'll probably get changed for the bindings at some point, probably to 3-clause BSD.

    On the wrapper generator: Really, the only actively maintained wrapper generator for Python that I'm aware of (which isn't project specific) is SWIG, which is not appropriate (at the very least, googling for 'python wrapper generator -swig' doesn't seem to give many results). In any case, the wrapper generator isn't a lot of code.

    ReplyDelete
  21. @anatoly: pyglet seems to be in maintenance mode right now. There are commits every few days, but only small stuff.

    On the other hand I understand that: pyglet supplies everything a backend for a game-engine needs (I use it¹), so the next step should be to use it for many games and see whether shared needs arise.

    ¹: See http://1w6.org/deutsch/anhang/programme/hexbattle-mit-zombies and https://bitbucket.org/ArneBab/hexbattle/

    ReplyDelete
  22. @David, I am speaking about OpenGL specific wrapper generators. I've added information to this page - https://www.opengl.org/wiki/Related_toolkits_and_APIs#OpenGL_loading_libraries

    The OpenGL generator in Python is included in regal project here https://github.com/p3/regal/scripts

    pyglet also has one.

    ReplyDelete
  23. Sorry, the correct link is https://github.com/p3/regal/tree/master/scripts

    ReplyDelete
  24. @Arne, kissing elves trick is low. =) Otherwise looks wesnothy and 2D. I don't see why it should use OpenGL. 3D models would be cool.

    I'd try to make it run on PySDL2 with "from sdl2.ext.api import pyglet". There is no pyglet API there, but would be interesting to see if it is possible to provide one.

    ReplyDelete
  25. @anatoly

    Pyglet's GL wrapper generator creates a lot of chained functions (fairly slow in cPython). I'm also not sure if there's enough development activity in Pyglet to allow modifying core code, and given the size of the Pyglet project I'm not going to fork it. PyOpenGL has more or less the same issues.

    Regal appears to be a very large project (a 68MB checkout), which has a scope much greater than just its wrapper generator - the sheer scope of the project does cause some barriers to entry. I'm still looking through, but I am fairly certain that it would take more effort to adapt Regals binding generator than I have expended on my own.

    ReplyDelete
  26. @anatoly: I like kissing elves ☺ (and when I get to write the next part of the story, I intend to keep them as player characters: That someone starts out in an intimate moment does not mean he or she is watchmeat).

    @David: I guess modifying core-code in pyglet is not that big of a problem, especially *because* it is mostly being maintained right now: Little danger of breaking the in-progress work of someone else.

    ReplyDelete
  27. @anatoly: more specifically, I do not consider intimate moments as cheap (and WTactics has the image, so I could pull this off). Instead I try to rid myself of baseless inhibitions, though that’s not always easy: Killing off no longer needed societal conditioning is among the hardest battles…

    ReplyDelete
  28. @Arne: Maybe it'd be worth looking at integrating it then; however, it really is a completely different approach - gl32 is a source code writer, whereas Pyglet uses Pythons inbuilt metaprogramming capabilities - and so it would be completely rewriting a large chunk of Pyglets core. Once I've got the binding generator finalised, it might be worth seeing if it's possible to replace Pyglet's OpenGL bindings with these ones.

    That said, in the interest of full disclosure: I'm not a fan of Pyglets per object draw method, again in the interests of speed. The per object draw method that Pyglet encourages with its API is not very scalable and eliminates a large number of the advantages of using OpenGL. So whilst I might see if gl32 can be plugged in for interesting benchmarks/proof-of-concept, I probably wouldn't try to get it bug-free and integrated into upstream Pyglet.

    ReplyDelete
  29. @Arne: Regarding Pyglet integration - it seems it would require a lot of work. There's two major issues - firstly, Pyglet only has raw OpenGL bindings, which are used everywhere and hence the "more pythonic" bindings of gl32 would be hard to integrate without editing every file using GL in Pyglet. Secondly, Pyglet uses GL functions which were removed in 3.2, and hence are not in gl32, so the API generator would have to be extended to handle any special cases on these functions.

    ReplyDelete
  30. @David: The per-object draw-method is very convenient for programming. As soon as you need more performance, most of the objects are grouped into batches, though. That way only the draw method of the batch is called and the batch can do all kinds of optimizations.

    ReplyDelete
  31. For Python 3.2 you might find useful stuff in the python-3 port of pyglet, though that hasn’t been released, yet, IIRC.

    ReplyDelete
  32. @Arne:

    I'd argue that objects with Z-order would be more convenient programmatically, but frankly that's a matter of opinion. (Incidentally, this is something I'm working on as well, and I think I'm mostly done on it).

    However, per-object-draw is only one concern I have on Pyglets speed credentials, as I do not believe Pyglet was written with speed as a design goal. For a different example, see pyglet.graphics.vertexbuffer; copying a ctypes object into a list in order to get slices to work is not a smart thing to do, performance wise!

    I'm not sure where you got Python 3.2 from, but what I meant was that currently I'm restricting myself to OpenGL 3.2, which means that certain older OpenGL functions do not exist. Pyglet uses some of these removed functions (e.g. glPushClientAttrib), and hence the bindings I'm generating at the moment do not provide all the features Pyglet uses.

    ReplyDelete
  33. I'd like to remind readers of these comments that this thread has gone farther and farther from both the original post and the whole blog -- which is supposed to be related to PyPy. I'm rather sure that you're now discussing performance on CPython, which in this case is very different from performance on PyPy (or would be if it supported all packages involved). Maybe move this discussion somewhere more appropriate?

    ReplyDelete
  34. @Armin: You’re right… actually I would be pretty interested, though, whether pypy also has a performance issue with pyglet's chained functions.

    ReplyDelete
  35. @Arne: In principal, PyPy seems to handle Pyglets chained functions relatively well (non-scientifically running the Astraea examples title screen sees CPU usage start very high, but eventually drops to about 80% of cPythons after the JIT warms up). There is one caveat preventing better testing: the moment keyboard input is given to Astraea on PyPy, PyPy segfaults.

    ReplyDelete
  36. @David: That is a really important feedback to Armin and and Anatoly, I think.

    ReplyDelete
  37. @David: Can you give some more background on the error (how to get the code, how to reproduce the segfault)?

    ReplyDelete
  38. @Arne: It's as simple as running the Astraea example in Pyglet and pressing a key (under PyPy 2.2, Pyglet 1.2-beta, Ubuntu 14.04). As far as I remember, this has been the case for some time (at least as far back as Ubuntu 12.10/PyPy 2.0 beta - although back then the major issue was PyPy using a lot more CPU; I didn't report this then due to a blog post at the time saying how cTypes would be rewritten). The error reported by Apport is "Cannot access memory at address 0x20"

    Doing a cursory scan through other examples, the noisy and text_input examples also have problems. noisy segfaults when a spawned ball collides with a boundary (occasionally giving a partial rpython traceback); text_input appears to have a random chance of any of the input boxes being selectable.

    Maybe it's time to file a proper bug report on this...

    ReplyDelete
  39. @Arne: I've now submitted a bug on the PyPy Bug tracker (Issue 1736), with more detail etc. Probably best to move conversation on any Pyglet related issues over there.

    ReplyDelete
  40. I came up with a funny idea about why not making emscripten generates code targeted on RPython, then now we can use C/C++ in PyPy directly? A LLVM to RPython compiler, how about this?

    ReplyDelete

See also PyPy's IRC channel: #pypy at freenode.net, or the pypy-dev mailing list.
If the blog post is old, it is pointless to ask questions here about it---you're unlikely to get an answer.