Friday, July 17, 2009

PyPy numeric experiments

Because PyPy will be presenting at the upcoming euroscipy conference, I have been playing recently with the idea of NumPy and PyPy integration. My idea is to integrate PyPy's JIT with NumPy or at least a very basic subset of it. Time constraints make it impossible to hand write a JIT compiler that understands NumPy. But given PyPy's architecture we actually have a JIT generator, so we don't need to write one :-)

Our JIT has shown that it can speed up small arithmetic examples significantly. What happens with something like NumPy?

I wrote a very minimal subset of NumPy in RPython, called micronumpy (only single-dimension int arrays that can only get and set items), and a benchmark against it. The point of this benchmark is to compare the performance of a builtin function (numpy.minimum) against the equivalent hand-written function, written in pure Python and compiled by our JIT.

The goal is to prove that it is possible to write algorithms in Python instead of C without loss of efficiency. Sure, we can write some functions (like minimum in the following example), but there is a whole universe of other ufuncs which would be cool to have in Python instead, assuming this could be done without a huge loss in efficiency.

Here are the results. This is comparing PyPy svn revision 66303 in the pyjitpl5 branch against python 2.6 with NumPy 1.2.1. The builtin numpy.minimum in PyPy is just a naive implementation in RPython, which is comparable to the speed of a naive implementation written in C (and thus a bit slower than the optimized version in NumPy):

NumPy (builtin function)0.12s
PyPy's micronumpy (builtin function)0.28s
CPython (pure Python)11s
PyPy with JIT (pure Python)0.91s

As we can see, PyPy's JIT is slower than the optmized NumPy's C version, but still much faster than CPython (12x).

Why is it slower? When you actually look at assembler, it's pretty obvious that it's atrocious. There's a lot of speedup to be gained out of just doing simple optimizations on resulting assembler. There are also pretty obvious limitations, like x86 backend not being able to emit opcodes for floats or x86_64 not being there. Those limitations are not fundamental in any sense and can be relatively straightforward to overcome. Therefore it seems we can get C-level speeds for pure Python implementations of numeric algorithms using NumPy arrays in PyPy. I think it's an interesting perspective that Python has the potential of becoming less of a glue language and more of a real implementation language in the scientific field.

Cheers,
fijal

Thursday, July 16, 2009

ECOOP 2009

Last week (from 6th to 10th of July) Anto, Armin and me (Carl Friedrich) were in the magnificent city of Genova, Italy at the ECOOP conference. In this blog post I want to give a (necessarily personal) account of what we did there.

Workshop days: ICOOOLPS

The first two days of the conference were the workshop days. On Monday we attended the ICOOOLPS workshop, (see the programme of the workshop). We had gotten two papers accepted at the workshop (one about layering PyPy's JIT on top of the CLR and one about the basic idea of PyPy's tracing JIT) and thus gave two presentations at the workshop, one was given by Anto, the other by me. Both went reasonably well, we got some positive feedback.

Nearly all the other talks were rather interesting as well. I particularly liked the one by Hans Schippers, who presented a machine model built on delegation called delMDSOC. The model is meant implement most features that a language would need that makes it possible to separate cross-cutting concerns. In the talk at ICOOOLPS he presented an extension to the model that adds concurrency support, using a combination of actors and coroutines. He then showed that the concurrency mechanisms of Java, Salsa (and extension of Java adding actors) and Io can be mapped to this model.

Furthermore there were two interesting invited talks, one by Andreas Gal (Mozilla), and one by Cliff Click (Azul Systems). Andreas explained how TraceMonkey works. This was very useful for me, because his talk was just before mine and I could thus kill most of my introduction about tracing JIT compilers and have more time for the really interesting stuff :-). Cliff talked about implementing other languages on top of the JVM and some of the pitfalls in getting them perform well.

All in all, ICOOOLPS was a very enjoyable workshop, also with many interesting discussions.

On Tuesday there were more workshops, but also the PyPy tutorial, so I only went to a few talks of the COP workshop and spent the rest of the morning preparing the tutorial (see next section).

Tutorial

On Tuesday afternoon we gave a PyPy Tutorial, as part of the ECOOP summer school. The first lesson we learned was that (as opposed to a community conference) people don't necessarily want to actually take their laptop out and try stuff. We gave a slow walk-through about the full life-cycle of development of a dynamic language interpreter using PyPy's tool-chain: Starting from writing your interpreter in RPython, testing it on top of CPython to translating it to C, .NET or Java to actually adding hints to get a JIT inserted.

There were about seven people attending the tutorial, a couple of which were very interested and were asking questions and discussing. Some of the discussions were even very technical, e.g. one about the details of our type-inference algorithm for RPython and why we cannot do a bottom-up analysis but have to use forward-propagation instead.

Jan Vitek of Purdue University told of some of the problems of the OVM project, which is (among other things) a Java implementation in Java (OVM also wants to support implementing VMs for other languages with it, if I understood correctly). He said that the project has essentially gotten too large and complicated, which means that it is very hard for new people to get into the project. While PyPy doesn't have some of the problems of a full Java implementation (e.g. right now our concurrency support is minimal) I definitely think that some of these risks apply to PyPy as well and we should find ways to improve the situation in this regard. Channeling Samuele: Somewhere inside the large lumbering blob of PyPy there is an elegant core trying to get out.

Main Conference

From Wednesday till Friday the main conference was happening. Many of the talks were not all that interesting for me, being quite Java centric. One talk that I liked a lot was "Making Sense of Large Heaps", which was presented by Nick Mitchell (IBM). He presented a tool called "Yeti" that can be used to analyze large heaps of Java programs. The tool uses some clever algorithms and heuristics to summarize the heap usage of data structures in intelligent ways to make it easier to find possible memory-wasters in a program. Nick also gave Anto and me a demo of the tool, where we tried to apply it to pypy-jvm (we found out that a fifth of the static data in there belongs to the parser/compiler :-( ).

On each of the days of the conference there was a keynote. I missed the one by Simon Peyton-Jones on Wednesday about type classes in Haskell. On Thursday, David Ungar was awarded the Dahl-Nygaard-Prize for his work on the Self programming language. Subsequently he gave a really inspiring keynote with the title "Self and Self: Whys and Wherefores" where he recollected Self's history, both on a technical as well as on a social level. Parts of the talk were snippets from the movies Self: The Movie and Alternate Reality Kit, both of which I highly recommend.

The keynote on Friday was by Cliff Click with the title "Java on 1000 Cores: Tales of Hardware/Software Co-design". He described the custom CPU architecture that Azul Systems has developed to run Java server applications on hundreds of cores. The talk mostly talked about the hardware, which I found very interesting (but some people didn't care for too much). Azul's CPU is essentially 54 in-order RISC cores in a single processor. The cores have a lot of extensions that make it easier to run Java on them, e.g. hardware read- and write-barriers, hardware-transactional-memory and hardware escape-detection (!).

In addition to the talks, there is of course always the hallway track (or coffee track) which is the track where you stand in the hallway and discuss with people. As usual, this was the most interesting part of the conference. One of those talks was Anto and me giving a PyPy demo to David Ungar. We had a very interesting discussion about VM implementation in general and the sort of debugging tools you need to write in particular. He liked PyPy a lot, which makes me very happy. He also liked the fact that I have actually read most Self papers :-).