Monday, November 14, 2011

Gothenburg sprint report

In the past week, we have been busy hacking on PyPy at the Gothenburg sprint, the second of this 2011. The sprint was hold at Laura's and Jacob's place, and here is a brief report of what happened.

In the first day we welcomed Mark Pearse, who was new to PyPy and at his first sprint. Mark worked the whole sprint in the new SpecialisedTuple branch, whose aim is to have a special implementation for small 2-items and 3-items tuples of primitive types (e.g., ints or floats) to save memory. Mark paired with Antonio for a couple of days, then he continued alone and did an amazing job. He even learned how to properly do Test Driven Development :-).

Antonio spent a couple of days investigating whether it is possible to use application checkpoint libraries such as BLCR and DMTCP to save the state of the PyPy interpreter between subsequent runs, thus saving also the JIT-compiled code to reduce the warmup time. The conclusion is that these are interesting technologies, but more work would be needed (either on the PyPy side or on the checkpoint library side) before it can have a practical usage for PyPy users.

Then, Antonio spent most of the rest of the sprint working on his ffistruct branch, whose aim is to provide a very JIT-friendly way to interact with C structures, and eventually implement ctypes.Structure on top of that. The "cool part" of the branch is already done, and the JIT already can compile set/get of fields into a single fast assembly instruction, about 400 times faster than the corresponding ctypes code. What is still left to do is to add a nicer syntax (which is easy) and to implement all the ctypes peculiarities (which is tedious, at best :-)).

As usual, Armin did tons of different stuff, including fixing a JIT bug, improving the performance of file.readlines() and working on the STM branch (for Software Transactional Memory), which is now able to run RPython multithreaded programs using software transaction (as long as they don't fill up all the memory, because support for the GC is still missing :-)). Finally, he worked on improving the Windows version of PyPy. While doing so he discovered together with Anto a terrible bug which lead to a continuous leak of stack space because the JIT called some functions using the wrong calling convention.

Håkan, with some help from Armin, worked on the jit-targets branch, whose goal is to heavily refactor the way the traces are internally represented by the JIT, so that in the end we can produce (even :-)) better code than what we do nowadays. More details in this mail.

Andrew Dalke worked on a way to integrate PyPy with FORTRAN libraries, and in particular the ones which are wrapped by Numpy and Scipy: in doing so, he wrote f2pypy, which is similar to the existing f2py but instead of producing a CPython extension module it produces a pure python modules based on ctypes. More work is needed before it can be considered complete, but f2pypy is already able to produce a wrapper for BLAS which passes most of the tests under CPython, although there's still work left to get it working for PyPy.

Armin and Håkan with Laura's "5x faster" cake
Christian Tismer worked the whole sprint on the branch to make PyPy compatible with Windows 64 bit. This needs a lot of work because a lot of PyPy is written under the assumption that the long type in C has the same bit size than void*, which is not true on Win64. Christian says that in the past Genova-Pegli sprint he completed 90% of the work, and in this sprint he did the other 90% of the work. Obviously, what is left to complete the task is the third 90% :-). More seriously, he estimated a total of 2-4 person-weeks of work to finish it.

But, all in all, the best part of the sprint has been the cake that Laura baked to celebrate the "5x faster than CPython" achievement. Well, actually our speed page reports "only" 4.7x, but that's because in the meantime we switched from comparing against CPython 2.6 to comparing against CPython 2.7, which is slightly faster. We are confident that we will reach the 5x goal again, and that will be the perfect excuse to eat another cake :-)


Albien said...

Freaking amazing guys together!!!

Kumo said...

"5x faster than CPython cake". Sounds delicious.

Anonymous said...

awesome! what do you think? how much room for improvement is there? is 10x possible? :)

Luis said...

Congratulations! I guess that 5x faster (Unladen Swallow's performance goal) means that pypy is now "officially" fast.

As Anonymous asked above, I also wonder how much room for improvement there is from now on.
Have all the low hanging fruits been picked already? Can we expect this pace of improvement to go on for a while? Or you are close to hit the limit?

Well, I know it's hard to predict... I'd just like to know what your heart tells you :-)

Thank you guys for all the hard work!

Anonymous said...

does pygame work with pypy? would be awesome... what about pyopengl?

Anonymous said...

Sorry, but pyopengl require either numpy or Numeric, which unfortunatly ain't supported yet.

Anonymous said...

Five times faster than CPython. Great! How does it compare to C?