Monday, January 25, 2010

Nightly graphs of PyPy's performance


In the past few months, we made tremendous progress on the JIT front. To monitor the progress daily, we introduced recently some cool graphs that plot revision vs performance. They are based on unladen swallow benchmark runner and they're written entirely in JavaScript, using canvas via the JQuery and Flot libraries. It's amazing what you can do in JavaScript these days... They are also tested via the very good oejskit plugin, that integrates py.test with JavaScript testing, driven by the command line.

As you can probably see, we're very good on some benchmarks and not that great on others. Some of the bad results come from the fact that while we did a lot of JIT-related work, other PyPy parts did not see that much love. Some of our algorithms on the builtin data types are inferior to those of CPython. This is going to be an ongoing focus for a while.

We want to first improve on the benchmarks for a couple of weeks before doing a release to gather further feedback.

Cheers, fijal


Bill Mill said...

So... what's a revision number that I can use? Am I just supposed to guess? The page should have a reasonable default revision number.

Bill Mill said...

for anyone else looking, 70700 is a reasonable place to start. (The graphs are really nice by the way, I'm not hating!)

Anonymous said...

a couple of suggestions:

1. scale for X axis (dates are likely to be more interesting than revision numbers)

1a. scale for Y axis

2. Add another line: unladen swallow performance

Gaƫtan de Menten said...

+1 for Anonymous's suggestions 1 and 2.

RPG said...

This is cool.

Unladen Swallow's perf should also be considered if possible.

Maciej Fijalkowski said...


Regarding revisions - by default it points to the first one we have graphs from, so you can just slice :) Also, yeah, revision numbers and dates should show up, will fix that. We don't build nightly unladen swallow and we don't want to run it against some older version, because they're improving constantly.


Anonymous said...

Wonderful idea, great implementation (axis are needed, tooltips would be interesting for long series), impressive results.

I hope you guys exploit this to raise interest in PyPy in this pre-release period. Just take a look at the response you get to posts involving numbers, benchmarks, etc. (BTW, keep linking to the funding post) :)

A series of short posts discussing hot topics would be a sure way to keep Pypy around the news until the release, so you get as much feedback as possible.


- Possible factors in slower results (discuss points in the Some Benchmarking post);

- One-of comparisons to different CPython versions, Unladen Swallow, ShedSkin, [C|J|IronP]ython (revisit old benchmarks posts?);

- Mention oprofile and the need for better profiling tools in blog, so you can crowdsource a review of options;

- Ping the Phoronix Test Suite folks to include Pypy translation (or even these benchmarks) in their tests: Python is an important part of Linux distros;

- Don't be afraid to post press-quotable numbers and pics, blurbs about what Pypy is and how much it's been improving, etc. Mention unrelated features of the interpreter (sandboxable!), the framework (free JIT for other languages), whatever;

- The benchmark platform (code, hardware, plans for new features).

Ilya said...

Regarding comparison with unladen swallow: I think having a point per month would be good enough for comparison purposes.

Maciej Fijalkowski said...

@Anonymous: Great suggestions! I'll look at this issues. In fact, things like profiling has been highly on our todo list, but we should advertise it more. We surely miss someone who'll be good at PR :-)

Luis said...

Something's wrong with plot one's scale: the speed ups are represented by a first line of 2x, a second one of 4x and the third one is 8x. Shouldn't it be 6x instead?