Wednesday, December 1, 2010

PyPy 1.4 release aftermath

A couple days have passed since the announcement of the 1.4 release, and this is a short summary of what happened afterwards. Let's start with numbers:

  • 16k visits to the release announcement on our blog
  • we don't have download statistics unfortunately
  • 10k visits to speed center
  • most traffic comes from referring sites, reddit alone creating above a third of our traffic

Not too bad for a project that doesn't have a well-established user base.

Lessons learned:

  • Releases are very important. They're still the major way projects communicate with community, even if we have nightly builds that are mostly stable.
  • No segfaults were reported, no incompatibilities between JIT and normal interpretation. We think that proves (or at least provides a lot of experimental evidence) that our write-once-and-then-transform method is effective.
  • A lot of people complained about their favorite module in C not working, we should have made it clearer that CPyExt is in alpha state. Indeed, we would like to know which C extension modules do work :-).
  • Some people reported massive speedups, other reported slowdowns compared to CPython. Most of those slowdowns relate to modules being inefficient (or doing happy nonsense), like ctypes. This is expected, given that not all modules are even jitted (although having them jitted is usually a matter of a couple of minutes).
  • Nobody complained about a lack of some stdlib module. We implemented the ones which are used more often, but this makes us wonder if less used stdlib modules have any users at all.

In general feedback has been overwhelmingly positive and we would like to thank everyone trying (and especially those reporting problems)

Cheers,
fijal

8 comments:

Bryan Murphy said...

I love what you guys are doing. Keep up the good work!

Leonardo Santagada said...

There was a complain about the lack of ssl module by someone trying to use pg8000 with pypy. I wonder if pypy should focus on openssl or on the ssl module.

Anonymous said...

pyglet actually seems to use the MacOS module to create some windows.

Paul Boddie said...

I'm very impressed with what you've all achieved!

I've been testing PyPy 1.4 with some code I'm working on which only depends on two pure-Python non-stdlib libraries, and although the result was a 50% longer running time than with Python 2.5, it's remarkable that the code behaves in the same way and produces the same results. When trying to produce a fully compatible implementation of something, it's no trivial task to get identical behaviour (even though I'm not really using "exotic" or "frivolous" language features): some corner case usually comes along and makes things difficult. To see a non-trivial example work just like "normal Python" is surely evidence that PyPy is ready for a wider audience.

As for my code, I've been doing some profiling generally - it uses things like the array and bisect modules substantially - and will attempt to see how profile-directed improvements affect PyPy's performance.

Keep up the good work!

The Cannon Family said...

A lot of the standard library looks like it has volumes of _legacy_ code depending on it, even if the current bleeding edge people use it less. In my mind supporting essentially all the standard library is a good long term goal, but as pointed out, parts of it can wait. Eventually I would like to see Tkinter support, and I would surmise that it is the most used of the stuff that is not implemented. We use it in a couple items (+/- 10% of total code, not likely to change). I would guess that the situations where these obscure parts of the standard library are being used are the parts were speed is maybe not the most important thing, supporting an existing workflow or parsing legacy data is the key.

Maciej Fijalkowski said...

@The Cannon Family

The question is why those legacy people would move to PyPy? PyPy is bleeding edge in a way.

Besides a lot of those modules are like audioop or ossaudiodev. I don't see legitimate usecase for those, even in legacy code.

Richard Jones said...

I'm very, very impressed and can't wait to use pypy in a real project. I'm blocked at the moment because I need pyglet on OS X (no MacOS module).

I gave an introduction to cython at the local Python user group and for a lark I ran the original pure-Python code up against the cython version.

cpython: 1.4s
cython: 0.2s
pypy: 0.2s

Hmm :-)

Xavier Combelle said...

I don't know how it is representative but for this usecase
there is a factor 7 between pypy and cpython 2.7

cpython 2.7
>>> timeit.Timer('sum(x for x in xrange(100000))').repeat(10,100)
[1.3338480523322112, 1.5916376967269201, 1.5959533140645483, 1.8427266639818676,
1.3473615220676294, 1.842070271069737, 1.3346074032759319, 1.5859678554627408,
1.8533299541683306, 1.5872797264355398]

pypy 1.4
>>>> timeit.Timer('sum(x for x in xrange(100000))').repeat(10,100)
[7.5079355199007978, 7.9444552948765477, 7.2710836043080178, 7.5406516611307666,
7.5192312421594352, 7.4927645588612677, 7.5075613773735768, 7.5201248774020826,
7.7839006757141931, 7.5898334809973278]

but maybe it is not representative