PyPy Status Blog: General performance improvements

Saturday, May 10, 2008

General performance improvements

Hi all,

During the past two weeks we invested some more efforts on the baseline performance of pypy-c. Some of the tweaks we did were just new ideas, and others were based on actual profiling. The net outcome is that we now expect PyPy to be in the worst case twice as slow than CPython on real applications. Here are some small-to-medium-size benchmark results. The number is the execution time, normalized to 1.0 for CPython 2.4:

1.90 on templess (a simple templating language)
1.49 on gadfly (pure Python SQL database)
1.49 on translate.py (pypy's own translation toolchain)
1.44 on mako (another templating system)
1.21 on pystone
0.78 on richards

(This is all without the JIT, as usual. The JIT is not ready yet.)

You can build yourself a pypy-c with this kind of speed with the magic command line (gcrootfinder is only for a 32-bit Linux machine):

    pypy/translator/goal/translate.py --gc=hybrid --gcrootfinder=asmgcc targetpypystandalone --allworkingmodules --faassen

The main improvements come from:

A general shortcut for any operation between built-in objects: for example, a subtraction of two integers or floats now dispatches directly to the integer or float subtraction code, without looking up the '__sub__' in the class.
A shortcut for getting attributes out of instances of user classes when the '__getattribute__' special method is not overridden.
The so-called Hybrid Garbage Collector is now a three-generations collector. More about our GCs...
Some profiling showed bad performance in our implementation of the built-in id() -- a trivial function to write in CPython, but a lot more fun when you have a moving GC and your object's real address can change.
The bytecode compiler's parser had a very slow linear search algorithm that we replaced with a dictionary lookup.

These benchmarks are doing CPU-intensive operations. You can expect a similar blog post soon about the I/O performance, as the io-improvements branch gets closer to being merged :-) The branch could also improve the speed of string operations, as used e.g. by the templating systems.

Hi all,

1.90 on templess (a simple templating language)
1.49 on gadfly (pure Python SQL database)
1.49 on translate.py (pypy's own translation toolchain)
1.44 on mako (another templating system)
1.21 on pystone
0.78 on richards

(This is all without the JIT, as usual. The JIT is not ready yet.)

You can build yourself a pypy-c with this kind of speed with the magic command line (gcrootfinder is only for a 32-bit Linux machine):

    pypy/translator/goal/translate.py --gc=hybrid --gcrootfinder=asmgcc targetpypystandalone --allworkingmodules --faassen

The main improvements come from:

A general shortcut for any operation between built-in objects: for example, a subtraction of two integers or floats now dispatches directly to the integer or float subtraction code, without looking up the '__sub__' in the class.
A shortcut for getting attributes out of instances of user classes when the '__getattribute__' special method is not overridden.
The so-called Hybrid Garbage Collector is now a three-generations collector. More about our GCs...
Some profiling showed bad performance in our implementation of the built-in id() -- a trivial function to write in CPython, but a lot more fun when you have a moving GC and your object's real address can change.
The bytecode compiler's parser had a very slow linear search algorithm that we replaced with a dictionary lookup.

Posted by Armin Rigo at 18:34

11 comments:

Anonymous said...: We had the same problem with id() (called object_id()) in Rubinius. We currently hide an objects's ID inside it's metaclass (allocating one if there isn't one).

Where did you guys store it?; May 10, 2008 at 9:07 PM
Anonymous said...: The ID is stored in a special dictionary (a normal dictionary specialized to be allocated so that the GC wont see it) that is used in the GC as a mapping from addresses to integers. This dict is updated when necessary (usually when collecting).; May 11, 2008 at 1:48 AM
Unknown said...: Wow. That sure is nice.; May 11, 2008 at 7:56 AM
Anonymous said...: My my, that must be a huge dictionary.; May 11, 2008 at 8:27 AM
Anonymous said...: The dictionary is of course only filled for objects that were used in an id() call.; May 11, 2008 at 10:12 AM
Armin Rigo said...: There are actually two dictionaries, at least when using one of the generational GCs: one for the first generation objects and one for the rest. The dictionary for the rest of the objects can probably get quite large, but it needs to be traversed once during each full collection only. It seems that full collections are rare enough: the full dictionary updating doesn't stand out in profiled runs.

I didn't think about implementing id() at the language level, e.g. by extending the class of the object to add a field.
We can't really do that in RPython. Moreover, that seems impractical for Python: if someone asks for the id() of an integer object, do all integers suddenly need to grow an 'id' field?; May 11, 2008 at 10:19 AM
Daivd said...: Great work!

I have a few questions not answered by the FAQ that I hope someone will be able to answer.

When might the JIT be ready enough? (no stress, just asking :)

How much faster are CPython 2.5, 2.6 and 3.0? That seems to be relevant to the statement "we now expect PyPy to be in the worst case twice as slow than CPython".

If I understand correctly, one of the purposes of PyPy is to make experimentation easier - so will making it compatible with 3.0 be fairly easy? Are there plans to do so?

Is PyPy expected to one day become a serious "competitor" to CPython, in that you might want to run it in production? Is there a time set for when it will be ready for use by the general public (i.e me ;)?; May 11, 2008 at 10:41 AM
Maciej Fijalkowski said...: So, answering questions one by one:

JIT will be ready when it'll be ready, not earlier.

CPython 2.5 is slightly faster for some operations. No real difference there. 2.6 was optimized for certain operations, but as well, don't expect a huge difference. I think you can expect pypy to be in range of 2x for any cpython. 3.0 is not even sure how will look like, but certainly being ultra fast is not it's primary goal.

Regarding making pypy compatible with 3.0 - yes, that should be fairly easy although we don't have any immediate plans doing that.

The final date for making pypy production ready is not set (and this is a gradual process), but as you can see here and here we're trying more and more to make it run existing applications.

Cheers,
fijal; May 11, 2008 at 11:19 AM
Anonymous said...: Note that current benchmarks suggest that CPython 3.0 is yet much slower than CPython 2.x. It might be interesting to see whether this means that PyPy is much faster than CPython 3.0 running e.g. Pystone.
Of course this fact would not be very surprising, esp. given that PyPy does not implement any CPy3k features.; May 11, 2008 at 11:43 AM
Luis said...: "JIT will be ready when it'll be ready, not earlier."

Alright, alright... we know.
But could you at least give us a very rough estimation for us, mere mortals? What does your heart tell you? :-); October 13, 2008 at 12:43 AM
Spencer said...: What kind of computations are done in richards? I.e., what sort of applications can expect better perfomance in PyPy than in CPy?; November 2, 2009 at 7:04 PM