Wednesday, March 2, 2011

US Trip Report: POPL, Microsoft, IBM

Some notes from my recent trip (from 23rd of January to 17th of February) to the US where, I presented PyPy at various scientifically oriented places. In summary, there seems to be quite a bit of interest in PyPy within the research community, details below.

PEPM/POPL/STOP

From the 24th to the 29th of January I was in Austin, Texas at the POPL conference, where I gave a talk at one of the workshops, PEPM (Partial Evaluation and Program Manipulation). The title of our paper is "Allocation Removal by Partial Evaluation in a Tracing JIT", the abstract is:

The performance of many dynamic language implementations suffers from high allocation rates and runtime type checks. This makes dynamic languages less applicable to purely algorithmic problems, despite their growing popularity. In this paper we present a simple compiler optimization based on online partial evaluation to remove object allocations and runtime type checks in the context of a tracing JIT. We evaluate the optimization using a Python VM and find that it gives good results for all our (real-life) benchmarks.

The talk (slides) seemed to be well-received and there was a good discussion afterwards. PEPM in general was a very enjoyable workshop with many interesting talks on partial evaluation (which I am very interested in) and a great keynote by Olivier Danvy about "A Walk in the Semantic Park".

POPL itself was a bit outside of the area I am most knowledgeable in, most of the talks being on formal topics. Some of the talks that stuck to my mind:

  • "The Design of Kodu: A Tiny Visual Programming Language for Children on the Xbox 360", the keynote by Matthew MacLaurin from Microsoft Research. I didn't know about Kodu before, and was very impressed by it.
  • "Automating String Processing in Spreadsheets using Input-Output Examples" (paper) by Sumit Gulwani (also from MS Research) describes a plugin to Excel that can automate many common string processing tasks by giving a couple of examples, which are then abstracted into a generic string manipulation. Very cool.
  • "Dynamic Inference of Static Types for Ruby" (paper) by Michael Furr, Jong-hoon (David) An, Jeffrey S. Foster and Michael Hicks describes an approach to type inference that works by observing the actual types seen during unit-testing. Similar things have been done a few times before, however, the paper actually gives a correctness result.
  • "The Essence of Compiling with Traces" (paper) by Shu-Yu Guo and Jens Palsberg describes a formalization of a simple imperative language and proves that executing it using trace compilation will do exactly the same thing than using an interpreter. It also looks at what conditions an optimization on traces must fulfill to still produce valid results.

After the main conference, I took part in the STOP (Scripts to Programs) workshop. It had a great keynote "Scripting in a Concurrent World" by John Field about the Thorn language and a few interesting other talks.

Microsoft Research

After POPL I went to Redmond to visit Microsoft Research for a week, specifically the RiSE group. This is the group that did the SPUR project, a meta-tracing JIT for C# applied to a JavaScript interpreter in C#. I compared PyPy to SPUR last year. I am very grateful for Microsoft for inviting me there.

At Microsoft I gave a talk about "PyPy's Approach to Implementing Dynamic Languages Using a Tracing JIT Compiler", the slides of which can be found here. The talk was filmed and is online. People seemed to be impressed with the "product qualities" of PyPy, e.g. the buildbot infrastructure and speed tracking website.

The rest of the time I discussed with various researchers in the RiSE group, particularly with Nikolai Tillmann. We talked a lot about similarities and differences between SPUR and PyPy and tried to understand our respective projects better. SPUR is a really great project and I learned a lot in the discussions, for example about the optimizations and heuristics their trace compiler uses.

Another very cool project done by the RiSE group that I learned more about is PEX. PEX is a unit test generator for C# that tries to produce unit tests for so-far untested execution paths within methods. There is an online puzzle version of it, if you want to get an impression of the technology (including a very impressive C# IDE in the browser).

IBM

For the last part of the trip I stayed in New York City for two weeks, mostly as a vacation. However, I also visited IBM Watson Research Center for two days, to which I had been invited by David Edelsohn.

The first day I gave the same presentation I had given at Microsoft (with some improvements to the slides), again it was quite well received. The rest of the time I spent in (very fruitful) discussions with various people and teams, among them the Liquid Metal team and the Thorn team.

The second day I met with members of the FIORANO group, who are working on dynamic compilation for dynamic languages and Java. They explored various ways to speed up Python, both by improving the CPython interpreter as well as with JIT compilation techniques.

Another of their projects is to add a trace compiler to IBM's J9 JVM, about which the paper "A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler" is going to appear at CGO. I discussed tracing JITs with Peng Wu, one of the authors of that paper. Peng tries to systematically look at the various heuristics found in the different VMs that use tracing JITs. This is a very different perspective from the one I usually have, focusing on how to improve PyPy's specific heuristics. Therefore that discussion helped me thinking about the issues more generally.

Another goal of the group is to try to find benchmarks that are representative for typical Python workloads, which is something that has been done very carefully for Java e.g. when developing the DaCapo benchmark suite. The benchmarks that the Python community uses have not been selected in such a careful and measured way, so I think that trying to be more systematic there is a very worthwhile endeavour.

6 comments:

holger krekel said...

Thanks for the interesting overview of your travels and research interactions! I i agree that getting better and more systematic benchmarks for Python would be worthwhile.

Ivan said...

I find this project fascinating.

I wonder what's the theoretical limit of this approach for improving the performance of python (or any other language implemented in pypy)?

Do you have any rought estimation on how far you can go? Have you reached a limit or you are just scratching the possibilities?

For example, do you think you can compete with javascript v8 or luajit?

Maciej Fijalkowski said...

Hi Ivan.

In general I don't think there are limits of approach other than say time and money. Python is a complex language.

Can you come up with an example where PyPy is actually slower than V8 *other* than computer language shootout? Programs on computer language shootout are just not nicely optimized for PyPy.

Ivan said...

Hi Fijall,

I'm afraid I don't know about benchmarks and comparison between these languages, other than the shootout. I guess this is the first reference someone gets when comparing languages, since it's the most popular out there.

But it would be great if there was a resource to compare against other languages. At least, from a marketing point of view, it would be very good for pypy.

May I know why the shootout is not a good parameter?

And, is there any other benchmarks comparing pypy against v8, tracemonkey/j├Ągermonkey, etc..?

Maciej Fijalkowski said...

Hi Ivan.

Shootout is not good because it contains heavily tuned programs, some of them even massively stretching the benchmark restrictions. They're tailored towards specific implementations, contain specific per-benchmark options etc. Nobody looked at python programs at detail and especially from PyPy perspective. This would need to be done first to compare those fairly, until it's not done, it's comparing naive version to a heavily optimized one and not comparing languages.

From what I measured roughly PyPy comes on par with tracemonkey and about 2x slower V8. But those were very unscientific experiments and I'll deny everything :)

I don't think there is any good cross-language comparison and that's at least partly due to the fact that workloads differ in different languages. Most shootout programs for example are tailored towards C workloads. Optimizing precisely for them (even if you have a good programs) is kind of fun, but it does not represent what we try to achieve, that is speeding up large python programs.

I hope this answers your question.

Cheers,
fijal

Anonymous said...

to me it seems like you have reached the goals of unladen swallow and unladen swallow was a bit of a failure?

if google wants a faster python, why don't they fund you? it would be awesome if the core team could work on it full-time. :)