Sunday, September 27, 2009

First results of the JIT

Hi all,

Just a quick note to tell you that we are progressing on the JIT front. Here are the running times of the richards benchmark on my laptop:

  • 8.18 seconds with CPython 2.5.2;
  • 2.61 seconds with pypy-c-jit (3x faster than CPython);
  • 1.04 seconds if you ignore the time spent making assembler (8x faster than CPython);
  • 1.59 seconds on Psyco, for reference (5x faster that CPython).

Yes, as this table shows, we are spending 1.57 seconds in the JIT support code. That's too much -- even ridiculously so -- for anything but a long-running process. We are working on that :-)

If you want to build your own pypy-c-jit (for x86-32 only for now):

  • you need a Subversion checkout of trunk;
  • run pypy/translator/goal/translate.py with the -Ojit option;
  • as usual, wait a long time (and be sure you have more than 1GB of RAM).

For now pypy-c-jit spews a lot of debugging output and there are a few known examples where it crashes. As we like to repeat, however, it's a complete JIT: apart from the crashes (the bugs are probably in the JIT support code), it supports the whole Python language from the start -- in the sense of doing correct things. Future work include Python-specific improvements by e.g. tweaking the data structures used to store Python objects so that they are more JIT-friendly.

EDIT: Oh yes, fijal reminds me that CPython 2.6 is 30% faster than CPython 2.5 on this benchmark (which is mostly my "fault", as I extracted a small part of PyPy and submitted it as a patch to CPython that works particularly well for examples like richards). It does not fundamentally change the fact that we are way faster though.

41 comments:

Cory said...

This thing just got interesting.

Why this particular benchmark?

Caleb said...

Fantastic!

At this point, it would be a really good idea for the pypy team to prepare downloadable binaries or setup tools, or eggs for making it extremely easy for a new user to try it out. Now that the performance is starting to become interesting, many more people will want to experiment with it and you don't want that enthusiam hampered by a somewhat involved setup process.

Cory said...

> it would be a really good idea for the pypy team to prepare downloadable binaries or setup tools, or eggs

I second this notion. I am among the group of people who are quite tempted to try things out, but not sure how much work I'll have to do first.

Anonymous said...

Me too I'd like pre-built binaries

Anonymous said...

I agree. Please put some binaries on your page to make it easier for everyone to survey what you've done!

Armin Rigo said...

This particular benchmark happens to be the one we use; there is no deep reason besides its relative simplicity (but it is not a microbenchmark, it's really computing something). We will of course make more tests with a better range of benchmarks when things start to settle a bit. Right now we are busy developing, and the numbers change every week.

It's also for this reason that there is no nicely-packaged release, sorry :-)
Note that translating your own pypy-c-jit is not a lot of work for you. It is just a lot of work for your CPU :-) You just do "svn co", "cd pypy/translator/goal" and "./translate -Ojit".

Anonymous said...

I would appreciate binaries because I don't have a computer with multi-GB RAM. I tried translating pypy a few months ago but gave up after several hours (the computer was just swapping constantly).

I can wait some longer, but regular binary releases (even if just unstable trunk snapshots) would be useful.

Anyway, keep up the good work! This is looking really promising.

Ideal said...

Quite nice, thank you..

PavPanchekha said...

Perhaps there is some way to store generated assembler code? I don't know too much about assembler or the JIT backend, but I assume that it'd be possible to stick the generated assembler code into a comment (or, if those don't exist, a docstring) in the .pyc file, so that a library that is commonly imported won't have to waste time generating assembler.

Benjamin said...

@PavPanchekha We specialize the assembler agressively, so that probably wouldn't be so useful. We have a lot of room to improve on assembly generation, though.

Ben said...

Thanks for the update!

Anonymous said...

Anonymous said: I would appreciate binaries because I don't have a computer with multi-GB RAM.

I do have such a computer, but I would still appreciate binaries, because the current trunk does not translate for me:

[translation:ERROR] File "/tmp/pypy/pypy/annotation/annrpython.py", line 227, in addpendingblock
[translation:ERROR] assert s_oldarg.contains(s_newarg)
[translation:ERROR] AssertionError':
[translation:ERROR] .. v1703 = simple_call((function mmap), v1702, map_size_0, (7), (34), (-1), (0))
[translation:ERROR] .. '(pypy.rlib.rmmap:628)alloc'

Armin Rigo said...

You are probably missing a dependency. See http://codespeak.net/pypy/dist/pypy/doc/getting-started-python.html#translating-the-pypy-python-interpreter

della said...

Great work! Is it possible to build the 32-bit binary on a 64-bit machine without too much effort? Having those instructions would certainly help us 64-bit people :)

Luis said...

I guess the time spent making assembler is only the first time the code is executed. Is that right? If so, we can consider an 8x speedup as the most accurate result. Or not?

Neil said...

@della: I use a 32-bit chroot on my own x64 machine. I don't know if that counts as "too much effort" (certainly it theoretically shouldn't require that), but it has been for me the most painless way to do it.

Maciej Fijalkowski said...

@Luis: yes, it's only first time.
Well, depends how you count, but it
can be considered 8x speedup...

Armin Rigo said...

Here are prebuilt C sources (in which "tracing" time was reduced by 20-30% since the blog post):

http://wyvern.cs.uni-duesseldorf.de/~arigo/chain.tar.bz2

Linux x86-32 only. You still need a svn checkout of PyPy, and you still need to compile them with gcc -- but it does not take too long: edit the first entry of the Makefile to point to your checkout of PyPy and type "make". This still assumes that all dependencies have been installed first. Don't complain if the #includes are at the wrong location for your distribution; you would get them right if you translated the whole thing yourself. In fact, don't complain if it does not compile for any reason, please :-) C sources like that are not really supposed to be portable, because they are just intermediates in the translation process.

Anonymous said...

̉You are probably missing a dependency. See http://codespeak.net/pypy/dist/pypy/doc/getting-started-python.html#translating-the-pypy-python-interpreter

Dear Armin, it seem like this document should mention libexpat1-dev and libssl-dev as dependencies, too. Anyway, I managed to build pypy-c, and here are the result for some small benchmarks I wrote. (Is there a way here at blogger.com to not break the formatting?)

python 2.5 psyco pypy-c
richards 14.9 2.9 3.9
mergesort 27.6 4.8 26.3
convexhull 9.4 5.6 6.3
bigcityskyline 46.9 3.1 7.6
fft 14.1 15.4 25.0

Thank you all for your efforts.

Armin Rigo said...

Thanks for the missing dependencies; added to the development version of the page. Thanks also for the numbers you report. The next obvious thing we miss is float support (coming soon!), which shows in some of your benchmarks.

illume said...

Hi,

this is so unbelievably awesome, it's going to take me a while to recover from all the awesomness.

CONGRATS!

ps. a nice improvement for users is to get your ./configure script to find dependencies and report the ones missing, and ones used (s/configure/setup.py/g).

Anonymous said...

nice!

so what is your guess at the moment? how fast can pypy get if you further optimize the jit?

Anonymous said...

Dear Pypy developers, is it possible to switch off the very agressive JIT logging in pypy-c? First, this could make pypy-c a drop-in replacement for cpython. (Many more beta-testers.) Second, the logging itself seems to be somewhat resource-intensive.

Very cool Mandelbrot ascii art, by the way.

Maciej Fijalkowski said...

Dear anonymous.

you can compile ./translate.py -Ojit --jit-debug=profile

There is no runtime switch unfortunately, so far.

Cheers,
fijal

Anonymous said...

Thank you! For many of us, the translation-time switch will be just as good.

della said...

I can't seem to compile (32-bit Ubuntu 9.10 chroot), by manually executing the Makefile in /tmp/usession-0/testing_1 I get this traceback:

File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 1210, in (module)
tracker.process(f, g, filename=fn)
File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 229, in process
lines = self.process_function(lines, entrypoint, filename)
File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 244, in process_function
table = tracker.computegcmaptable(self.verbose)
File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 285, in computegcmaptable
self.parse_instructions()
File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 364, in parse_instructions
meth = self.find_missing_visit_method(opname)
File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 390, in find_missing_visit_method
raise UnrecognizedOperation(opname)
__main__.UnrecognizedOperation: jc

there are some type warnings also for pointers, I don't know if they could be any useful. Maybe you can help me?

Armin Rigo said...

Thanks for the report, della. Fixed, if you want to try again. Parsing gcc output is a bit delicate as the exact set of operations used depends on the specific version and command-line options passed to gcc.

Armin Rigo said...

Since the blog post, here are the updated numbers: we run richards.py in 2.10 seconds (almost 4x faster than CPython), and only spend 0.916 seconds actually running the assembler (almost 9x faster than CPython).

Anonymous said...

Very nice. Do you expect to get faster than psyco?

Luis said...

This is very exciting! Please, try to post updates to these figures... thanks!

Andrew said...

I was having the same problem as della, and your fix seems to work, but it's breaking somewhere else now. I don't think I have a dependency problem, I can build a working pypy-c without jit. Running make manually produces heaps of warnings about incompatible pointers, some probably harmless (int* vs long int*, these should be the same on x86-32), but others worry me more, like struct stat* vs struct stat64*, or struct SSL* vs char**. I put the complete output of a manual run of make online.

NickDaly said...

Interestingly, the translation itself seems to consume at most about 960MB of ram. It's easy to translate on a system even with only a gig of ram if you stop everything else.

Try switching run levels or the like.

The -Ojit option seems to cause an error in translation with Revision 68125, when translated using Python 2.5.2 on Debian Lenny.

proteusguy said...

First off - congratulations and good job on the great progress. I've been watching this project since the 2007 PyCon in DFW and it's great to see these promising results.

That said, while I know there's still a lot of work to do and this is very much an in-progress thing, I'm very much looking forward to an excuse to try this stuff out in anger - real practical situations. For me that means some statistical calculation engines (monto-carlo analysis) front ended by web services. In both situations this brings up two constraints: a) must support 64bit (because our data sets rapidly go above 4GB RAM) and b) must not be overly memory hungry (because any significant incremental overhead really hurts when your data sets are already over 4GB RAM).

For now we use Psyco for small stuff but have to re-implement in C++ once we hit that 32-bit limit. PyPy is very exciting as a practical alternative to Psyco because of anticipated 64bit support. I wonder if, due to the existence fo Psyco already, that PyPy shouldn't focus first on 64bit instead?

Few things would speed up progress than getting PyPy used out in the wild - even if only by those of us who appreciate it's very much in flux but still understand how to benefit from it.

I understand you guys have your focus and goals and encourage you to keep up the good work. Just thought I'd throw this out as an idea to consider. I'm sure there are a lot like me anxious to give it a spin.

-- Ben Scherrey

Armin Rigo said...

Andrew: can you update and try again? If you still have the .c files around it is enough to go there and type "make"; otherwise, restart the build. It should still crash, but give us more information about why it does.

Andrew said...

The new traceback is:

Traceback (most recent call last):
File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 1211, in <module>
assert fn.endswith('.s')
AssertionError

Is the position in input tracked? that might help, or I could package my .gcmap files.

Andrew said...

The trouble seems to be implement.gcmap and implement_9.gcmap. These are bothe empty, and trigger the assertion error.

Running trackgcroot as the Makefile does, but without those two files, permits compilation to continue, but linking fails with undefined references to various symbols with the prefix 'pypy_g_'.

I suspected the changes might have invalidated the old .gcmap files, so I tried removing them, and got this when it tried to generate implement.gcmap:

Traceback (most recent call last):
File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 1214, in <module>
tracker.process(f, g, filename=fn)
File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 229, in process
lines = self.process_function(lines, entrypoint, filename)
File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 244, in process_function
table = tracker.computegcmaptable(self.verbose)
File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 285, in computegcmaptable
self.parse_instructions()
File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 365, in parse_instructions
insn = meth(line)
File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 741, in visit_jmp
self.conditional_jump(line)
File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 757, in conditional_jump
raise UnrecognizedOperation(line)
__main__.UnrecognizedOperation: jmp T.14141

NickDaly said...

A correction/clarification to last night's post:

There isn't a bug in the -Ojit translation process, I was just missing a dependency that I could've sworn I've installed before.

The translation process only takes < 1GB memory if done without any options. Attempting to translate with the -Ojit option takes at least 2.5GB of RAM, as I tried last night (with it as the only running process) and it consumed my swapfile and ran out of memory.

Is there any documented way to use a translated pypy binary to build other pypy translations? That might help reduce the build requirements, and would also be mighty cool.

Armin Rigo said...

NickDaly: checked in, please try. Also, please come to the mailing list instead of posting here if you have further comments to do... http://codespeak.net/mailman/listinfo/pypy-dev

Michael Allman said...

Is pypy-c-jit written in C or Python or something else? I ask because of the "c" in pypy-c-jit.

Carl Friedrich Bolz said...

Michael: It is written in RPython (a subset of Python) but then translated to C. By convention we therefore call the executable-name pypy-c. If the executable also contains a JIT, we call it pypy-c-jit.

Carl Friedrich Bolz said...

Ben Scherrey: 64bit support might happen not too far in the future. Not using too much memory is a different problem, that might take a while longer. It has two aspects, one is that the JIT itself uses way too much memory at the moment. We will work on that soon.

The other aspect is making sure that your dataset does not take too much heap. It depends a bit which data structures you use, but it's not likely to be that great right now. That might change at some point, I have some ideas in that direction, but not really time to work on the soon.