Sunday, September 27, 2009

First results of the JIT

Hi all,

Just a quick note to tell you that we are progressing on the JIT front. Here are the running times of the richards benchmark on my laptop:

  • 8.18 seconds with CPython 2.5.2;
  • 2.61 seconds with pypy-c-jit (3x faster than CPython);
  • 1.04 seconds if you ignore the time spent making assembler (8x faster than CPython);
  • 1.59 seconds on Psyco, for reference (5x faster that CPython).

Yes, as this table shows, we are spending 1.57 seconds in the JIT support code. That's too much -- even ridiculously so -- for anything but a long-running process. We are working on that :-)

If you want to build your own pypy-c-jit (for x86-32 only for now):

  • you need a Subversion checkout of trunk;
  • run pypy/translator/goal/translate.py with the -Ojit option;
  • as usual, wait a long time (and be sure you have more than 1GB of RAM).

For now pypy-c-jit spews a lot of debugging output and there are a few known examples where it crashes. As we like to repeat, however, it's a complete JIT: apart from the crashes (the bugs are probably in the JIT support code), it supports the whole Python language from the start -- in the sense of doing correct things. Future work include Python-specific improvements by e.g. tweaking the data structures used to store Python objects so that they are more JIT-friendly.

EDIT: Oh yes, fijal reminds me that CPython 2.6 is 30% faster than CPython 2.5 on this benchmark (which is mostly my "fault", as I extracted a small part of PyPy and submitted it as a patch to CPython that works particularly well for examples like richards). It does not fundamentally change the fact that we are way faster though.

41 comments:

  1. This thing just got interesting.

    Why this particular benchmark?

    ReplyDelete
  2. Fantastic!

    At this point, it would be a really good idea for the pypy team to prepare downloadable binaries or setup tools, or eggs for making it extremely easy for a new user to try it out. Now that the performance is starting to become interesting, many more people will want to experiment with it and you don't want that enthusiam hampered by a somewhat involved setup process.

    ReplyDelete
  3. > it would be a really good idea for the pypy team to prepare downloadable binaries or setup tools, or eggs

    I second this notion. I am among the group of people who are quite tempted to try things out, but not sure how much work I'll have to do first.

    ReplyDelete
  4. Me too I'd like pre-built binaries

    ReplyDelete
  5. I agree. Please put some binaries on your page to make it easier for everyone to survey what you've done!

    ReplyDelete
  6. This particular benchmark happens to be the one we use; there is no deep reason besides its relative simplicity (but it is not a microbenchmark, it's really computing something). We will of course make more tests with a better range of benchmarks when things start to settle a bit. Right now we are busy developing, and the numbers change every week.

    It's also for this reason that there is no nicely-packaged release, sorry :-)
    Note that translating your own pypy-c-jit is not a lot of work for you. It is just a lot of work for your CPU :-) You just do "svn co", "cd pypy/translator/goal" and "./translate -Ojit".

    ReplyDelete
  7. I would appreciate binaries because I don't have a computer with multi-GB RAM. I tried translating pypy a few months ago but gave up after several hours (the computer was just swapping constantly).

    I can wait some longer, but regular binary releases (even if just unstable trunk snapshots) would be useful.

    Anyway, keep up the good work! This is looking really promising.

    ReplyDelete
  8. Quite nice, thank you..

    ReplyDelete
  9. Perhaps there is some way to store generated assembler code? I don't know too much about assembler or the JIT backend, but I assume that it'd be possible to stick the generated assembler code into a comment (or, if those don't exist, a docstring) in the .pyc file, so that a library that is commonly imported won't have to waste time generating assembler.

    ReplyDelete
  10. @PavPanchekha We specialize the assembler agressively, so that probably wouldn't be so useful. We have a lot of room to improve on assembly generation, though.

    ReplyDelete
  11. Anonymous said: I would appreciate binaries because I don't have a computer with multi-GB RAM.

    I do have such a computer, but I would still appreciate binaries, because the current trunk does not translate for me:

    [translation:ERROR] File "/tmp/pypy/pypy/annotation/annrpython.py", line 227, in addpendingblock
    [translation:ERROR] assert s_oldarg.contains(s_newarg)
    [translation:ERROR] AssertionError':
    [translation:ERROR] .. v1703 = simple_call((function mmap), v1702, map_size_0, (7), (34), (-1), (0))
    [translation:ERROR] .. '(pypy.rlib.rmmap:628)alloc'

    ReplyDelete
  12. You are probably missing a dependency. See http://codespeak.net/pypy/dist/pypy/doc/getting-started-python.html#translating-the-pypy-python-interpreter

    ReplyDelete
  13. Great work! Is it possible to build the 32-bit binary on a 64-bit machine without too much effort? Having those instructions would certainly help us 64-bit people :)

    ReplyDelete
  14. I guess the time spent making assembler is only the first time the code is executed. Is that right? If so, we can consider an 8x speedup as the most accurate result. Or not?

    ReplyDelete
  15. @della: I use a 32-bit chroot on my own x64 machine. I don't know if that counts as "too much effort" (certainly it theoretically shouldn't require that), but it has been for me the most painless way to do it.

    ReplyDelete
  16. @Luis: yes, it's only first time.
    Well, depends how you count, but it
    can be considered 8x speedup...

    ReplyDelete
  17. Here are prebuilt C sources (in which "tracing" time was reduced by 20-30% since the blog post):

    http://wyvern.cs.uni-duesseldorf.de/~arigo/chain.tar.bz2

    Linux x86-32 only. You still need a svn checkout of PyPy, and you still need to compile them with gcc -- but it does not take too long: edit the first entry of the Makefile to point to your checkout of PyPy and type "make". This still assumes that all dependencies have been installed first. Don't complain if the #includes are at the wrong location for your distribution; you would get them right if you translated the whole thing yourself. In fact, don't complain if it does not compile for any reason, please :-) C sources like that are not really supposed to be portable, because they are just intermediates in the translation process.

    ReplyDelete
  18. ̉You are probably missing a dependency. See http://codespeak.net/pypy/dist/pypy/doc/getting-started-python.html#translating-the-pypy-python-interpreter

    Dear Armin, it seem like this document should mention libexpat1-dev and libssl-dev as dependencies, too. Anyway, I managed to build pypy-c, and here are the result for some small benchmarks I wrote. (Is there a way here at blogger.com to not break the formatting?)

    python 2.5 psyco pypy-c
    richards 14.9 2.9 3.9
    mergesort 27.6 4.8 26.3
    convexhull 9.4 5.6 6.3
    bigcityskyline 46.9 3.1 7.6
    fft 14.1 15.4 25.0

    Thank you all for your efforts.

    ReplyDelete
  19. Thanks for the missing dependencies; added to the development version of the page. Thanks also for the numbers you report. The next obvious thing we miss is float support (coming soon!), which shows in some of your benchmarks.

    ReplyDelete
  20. Hi,

    this is so unbelievably awesome, it's going to take me a while to recover from all the awesomness.

    CONGRATS!

    ps. a nice improvement for users is to get your ./configure script to find dependencies and report the ones missing, and ones used (s/configure/setup.py/g).

    ReplyDelete
  21. nice!

    so what is your guess at the moment? how fast can pypy get if you further optimize the jit?

    ReplyDelete
  22. Dear Pypy developers, is it possible to switch off the very agressive JIT logging in pypy-c? First, this could make pypy-c a drop-in replacement for cpython. (Many more beta-testers.) Second, the logging itself seems to be somewhat resource-intensive.

    Very cool Mandelbrot ascii art, by the way.

    ReplyDelete
  23. Dear anonymous.

    you can compile ./translate.py -Ojit --jit-debug=profile

    There is no runtime switch unfortunately, so far.

    Cheers,
    fijal

    ReplyDelete
  24. Thank you! For many of us, the translation-time switch will be just as good.

    ReplyDelete
  25. I can't seem to compile (32-bit Ubuntu 9.10 chroot), by manually executing the Makefile in /tmp/usession-0/testing_1 I get this traceback:

    File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 1210, in (module)
    tracker.process(f, g, filename=fn)
    File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 229, in process
    lines = self.process_function(lines, entrypoint, filename)
    File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 244, in process_function
    table = tracker.computegcmaptable(self.verbose)
    File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 285, in computegcmaptable
    self.parse_instructions()
    File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 364, in parse_instructions
    meth = self.find_missing_visit_method(opname)
    File "/home/della/pkg/pypy/trunk/pypy/translator/c/gcc/trackgcroot.py", line 390, in find_missing_visit_method
    raise UnrecognizedOperation(opname)
    __main__.UnrecognizedOperation: jc

    there are some type warnings also for pointers, I don't know if they could be any useful. Maybe you can help me?

    ReplyDelete
  26. Thanks for the report, della. Fixed, if you want to try again. Parsing gcc output is a bit delicate as the exact set of operations used depends on the specific version and command-line options passed to gcc.

    ReplyDelete
  27. Since the blog post, here are the updated numbers: we run richards.py in 2.10 seconds (almost 4x faster than CPython), and only spend 0.916 seconds actually running the assembler (almost 9x faster than CPython).

    ReplyDelete
  28. Very nice. Do you expect to get faster than psyco?

    ReplyDelete
  29. This is very exciting! Please, try to post updates to these figures... thanks!

    ReplyDelete
  30. I was having the same problem as della, and your fix seems to work, but it's breaking somewhere else now. I don't think I have a dependency problem, I can build a working pypy-c without jit. Running make manually produces heaps of warnings about incompatible pointers, some probably harmless (int* vs long int*, these should be the same on x86-32), but others worry me more, like struct stat* vs struct stat64*, or struct SSL* vs char**. I put the complete output of a manual run of make online.

    ReplyDelete
  31. Interestingly, the translation itself seems to consume at most about 960MB of ram. It's easy to translate on a system even with only a gig of ram if you stop everything else.

    Try switching run levels or the like.

    The -Ojit option seems to cause an error in translation with Revision 68125, when translated using Python 2.5.2 on Debian Lenny.

    ReplyDelete
  32. First off - congratulations and good job on the great progress. I've been watching this project since the 2007 PyCon in DFW and it's great to see these promising results.

    That said, while I know there's still a lot of work to do and this is very much an in-progress thing, I'm very much looking forward to an excuse to try this stuff out in anger - real practical situations. For me that means some statistical calculation engines (monto-carlo analysis) front ended by web services. In both situations this brings up two constraints: a) must support 64bit (because our data sets rapidly go above 4GB RAM) and b) must not be overly memory hungry (because any significant incremental overhead really hurts when your data sets are already over 4GB RAM).

    For now we use Psyco for small stuff but have to re-implement in C++ once we hit that 32-bit limit. PyPy is very exciting as a practical alternative to Psyco because of anticipated 64bit support. I wonder if, due to the existence fo Psyco already, that PyPy shouldn't focus first on 64bit instead?

    Few things would speed up progress than getting PyPy used out in the wild - even if only by those of us who appreciate it's very much in flux but still understand how to benefit from it.

    I understand you guys have your focus and goals and encourage you to keep up the good work. Just thought I'd throw this out as an idea to consider. I'm sure there are a lot like me anxious to give it a spin.

    -- Ben Scherrey

    ReplyDelete
  33. Andrew: can you update and try again? If you still have the .c files around it is enough to go there and type "make"; otherwise, restart the build. It should still crash, but give us more information about why it does.

    ReplyDelete
  34. The new traceback is:

    Traceback (most recent call last):
    File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 1211, in <module>
    assert fn.endswith('.s')
    AssertionError

    Is the position in input tracked? that might help, or I could package my .gcmap files.

    ReplyDelete
  35. The trouble seems to be implement.gcmap and implement_9.gcmap. These are bothe empty, and trigger the assertion error.

    Running trackgcroot as the Makefile does, but without those two files, permits compilation to continue, but linking fails with undefined references to various symbols with the prefix 'pypy_g_'.

    I suspected the changes might have invalidated the old .gcmap files, so I tried removing them, and got this when it tried to generate implement.gcmap:

    Traceback (most recent call last):
    File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 1214, in <module>
    tracker.process(f, g, filename=fn)
    File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 229, in process
    lines = self.process_function(lines, entrypoint, filename)
    File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 244, in process_function
    table = tracker.computegcmaptable(self.verbose)
    File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 285, in computegcmaptable
    self.parse_instructions()
    File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 365, in parse_instructions
    insn = meth(line)
    File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 741, in visit_jmp
    self.conditional_jump(line)
    File "/home/chshrcat/build/pypy-trunk/pypy/translator/c/gcc/trackgcroot.py", line 757, in conditional_jump
    raise UnrecognizedOperation(line)
    __main__.UnrecognizedOperation: jmp T.14141

    ReplyDelete
  36. A correction/clarification to last night's post:

    There isn't a bug in the -Ojit translation process, I was just missing a dependency that I could've sworn I've installed before.

    The translation process only takes < 1GB memory if done without any options. Attempting to translate with the -Ojit option takes at least 2.5GB of RAM, as I tried last night (with it as the only running process) and it consumed my swapfile and ran out of memory.

    Is there any documented way to use a translated pypy binary to build other pypy translations? That might help reduce the build requirements, and would also be mighty cool.

    ReplyDelete
  37. NickDaly: checked in, please try. Also, please come to the mailing list instead of posting here if you have further comments to do... http://codespeak.net/mailman/listinfo/pypy-dev

    ReplyDelete
  38. Is pypy-c-jit written in C or Python or something else? I ask because of the "c" in pypy-c-jit.

    ReplyDelete
  39. Michael: It is written in RPython (a subset of Python) but then translated to C. By convention we therefore call the executable-name pypy-c. If the executable also contains a JIT, we call it pypy-c-jit.

    ReplyDelete
  40. Ben Scherrey: 64bit support might happen not too far in the future. Not using too much memory is a different problem, that might take a while longer. It has two aspects, one is that the JIT itself uses way too much memory at the moment. We will work on that soon.

    The other aspect is making sure that your dataset does not take too much heap. It depends a bit which data structures you use, but it's not likely to be that great right now. That might change at some point, I have some ideas in that direction, but not really time to work on the soon.

    ReplyDelete

See also PyPy's IRC channel: #pypy at freenode.net, or the pypy-dev mailing list.
If the blog post is old, it is pointless to ask questions here about it---you're unlikely to get an answer.