Tuesday, December 27, 2011

Leysin Winter Sprint

PyPy Leysin Winter Sprint: 15-22nd January 2012

The next PyPy sprint will be in Leysin, Switzerland, for the eighth time. This is a fully public sprint: newcomers and topics other than those proposed below are welcome.

Goals and topics of the sprint

  • Py3k: work towards supporting Python 3 in PyPy
  • NumPyPy: work towards supporting the numpy module in PyPy
  • JIT backends: integrate tests for ARM; look at the PowerPC 64; maybe try again to write an LLVM- or GCC-based one
  • STM and STM-related topics; or the Concurrent Mark-n-Sweep GC
  • And as usual, the main side goal is to have fun in winter sports :-) We can take a day off for ski.

Exact times

The work days should be 15-21 January 2011 (Sunday-Saturday). The official plans are for people to arrive on the 14th or the 15th, and to leave on the 22nd.

Interested?

Thursday, December 22, 2011

Come see us at PyCon 2012

PyCon 2012 is coming up in just a few short months, and PyPy will be well
represented there. We'll be delivering a tutorial, two talks, plus we'll be
around for the sprints.

Here are the abstracts for the tutorials and talks:

  • How to get the most out of your PyPy, by Maciej Fijalkowski, Alex Gaynor
    and Armin Rigo: For many applications PyPy can provide performance benefits
    right out of the box. However, little details can push your application to
    perform much better. In this tutorial we'll give you insights on how to push
    PyPy to its limits. We'll focus on understanding the performance
    characteristics of PyPy, and learning the analysis tools in order to maximize
    your applications' performance. This is the tutorial.
  • Why PyPy by example, by Maciej Fijalkowski, Alex Gaynor and Armin Rigo:
    One of the goals of PyPy is to make existing Python code faster; however an
    even broader goal was to make it possible to write things in Python that
    previously would needed to be written in C or other low-level language. This
    talk will show examples of this, and describe how they represent the
    tremendous progress PyPy has made, and what it means for people looking at
    using PyPy.
  • How the PyPy JIT works, by Benjamin Peterson: The Python community is
    abuzz about the major speed gains PyPy can offer for pure Python code. But how
    does the PyPy JIT actually work? This talk will discuss how the PyPy JIT is
    implemented. It will include descriptions of the tracing, optimization, and
    assembly generation phases. I will demonstrate each step with an example loop.

If you have any questions let us know! We look forward to seeing people at
PyCon and chatting about PyPy and the entire Python ecosystem.

See you there,
Maciej Fijalkowski, Alex Gaynor, Benjamin Peterson, Armin Rigo, and the entire PyPy team

Thursday, December 8, 2011

Plotting using matplotlib from PyPy

Big fat warning This is just a proof of concept. It barely works. There are missing pieces left and right, which were replaced with hacks so I can get this to run and prove it's possible. Don't try this at home, especially your home. You have been warned.

There has been a lot of talking about PyPy not integrating well with the current scientific Python ecosystem, and numpypy (a NumPy reimplementation on top of pypy) was dubbed "a fancy array library". I'm going to show that integration with this ecosystem is possible with our design.

First, the demo:

#!/usr/bin/env pypy

# numpy, pypy version
import numpypy as numpy
# DRAGONS LIVE THERE (fortunately hidden)
from embed.emb import import_mod

pylab = import_mod('matplotlib.pylab')

if __name__ == '__main__':
    a = numpy.arange(100, dtype=int)
    b = numpy.sin(a)
    pylab.plot(a, b)

And you get:

Now, how to reproduce it:

  • You need a PyPy without cpyext, I did not find a linker that would support overriding symbols. Right now there are no nightlies like this, so you have to compile it yourself, like:

    ./translate.py -Ojit targetpypystandalone.py --withoutmod-cpyext

    That would give you a PyPy that's unable to load some libraries like PIL, but perfectly working otherwise.

  • Speaking of which, you need a reasonably recent PyPy.

  • The approach is generally portable, however the implementation has been tested only on 64bit linux. Few tweaks might be required.

  • You need to install python2.6, the python2.6 development headers, and have numpy and matplotlib installed on that python.

  • You need a checkout of my hacks directory and put embedded on your PYTHONPATH, your pypy checkout also has to be on the PYTHONPATH.

Er wait, what happened?

What didn't happen is we did not reimplement matplotlib on top of PyPy. What did happen is we embed CPython inside of PyPy using ctypes. We instantiate it. and follow the embedding tutorial for CPython. Since numpy arrays are not movable, we're able to pass around an integer that's represents the memory address of the array data and reconstruct it in the embedded interpreter. Hence with a relatively little effort we managed to reuse the same array data on both sides to plot at array. Easy, no?

This approach can be extended to support anything that's not too tied with python objects. SciPy and matplotlib both fall into the same category but probably the same strategy can be applied to anything, like GTK or QT. It's just a matter of extending a hack into a working library.

To summarize, while we're busy making numpypy better and faster, it seems that all external libraries on the C side can be done using an embedded Python interpreter with relatively little effort. To get to that point, I spent a day and a half to learn how to embed CPython, with very little prior experience in the CPython APIs. Of course you should still keep as much as possible in PyPy to make it nice and fast :)

Cheers, fijal