Saturday, March 15, 2014

STMGC-C7 with PyPy

Hi all,

Here is one of the first full PyPy's (edit: it was r69967+, but the general list of versions is currently here) compiled with the new StmGC-c7 library. It has no JIT so far, but it runs some small single-threaded benchmarks by taking around 40% more time than a corresponding non-STM, no-JIT version of PyPy. It scales --- up to two threads only, which is the hard-coded maximum so far in the c7 code. But the scaling looks perfect in these small benchmarks without conflict: starting two threads each running a copy of the benchmark takes almost exactly the same amount of total time, simply using two cores.

Feel free to try it! It is not actually useful so far, because it is limited to two cores and CPython is something like 2.5x faster. One of the important next steps is to re-enable the JIT. Based on our current understanding of the "40%" figure, we can probably reduce it with enough efforts; but also, the JIT should be able to easily produce machine code that suffers a bit less than the interpreter from these effects. This seems to mean that we're looking at 20%-ish slow-downs for the future PyPy-STM-JIT.

Interesting times :-)

For reference, this is what you get by downloading the PyPy binary linked above: a Linux 64 binary (Ubuntu 12.04) that should behave mostly like a regular PyPy. (One main missing feature is that destructors are never called.) It uses two cores, but obviously only if the Python program you run is multithreaded. The only new built-in feature is with __pypy__.thread.atomic: this gives you a way to enforce that a block of code runs "atomically", which means without any operation from any other thread randomly interleaved.

If you want to translate it yourself, you need a trunk version of clang with three patches applied. That's the number of bugs that we couldn't find workarounds for, not the total number of bugs we found by (ab)using the address_space feature...

Stay tuned for more!

Armin & Remi

11 comments:

Armin Rigo said...

The provided pypy-c crashes when calling fork(). Sadly fork() is indirectly called by a lot of things, including the subprocess module --- which can be executed just by importing random modules...

ArneBab said...

That sounds pretty huge!

Do you require clang for that? (why is it named on https://bitbucket.org/pypy/pypy/raw/stmgc-c7/TODO )

Armin Rigo said...

Only clang has the address_space extension mention in the blog post; gcc does not.

Unknown said...

I want to hear more talks on this. When is your next talk... pycon 2014? It would be hilarious if the pypy group were able to create naive concurrency in python, no one would have seen that coming! Many would have thought, "surely Haskell", or some other immutable, static language would get us there first. But no, it might just be that pypy allows any language that targets it to be concurrent, kiss style...amazing! Anyway, enough gushing, time for a random question. Mainstream vms like the JVM have added ways of speeding up dynamic languages, what advantages does pypy have over these traditional vms(other than the concurrency one that might come to fruition)? I think this would be a good question to answer at the next talk for pypy.

Armin Rigo said...

As it turns out there will be no PyPy talk at PyCon 2014.

The JVM runs Jython at a speed that is around that of CPython. PyPy runs substantially faster than this. One difference is that PyPy contains a small number of annotations targeted specifically towards RPython's JIT generator, whereas the JVM has no support for this.

Armin Rigo said...

Update containing the most obvious fixes: http://cobra.cs.uni-duesseldorf.de/~buildmaster/misc/pypy-c-r70103-70091-stm.tbz2 (Ubuntu 12.04 Linux 64-bit)

Unknown said...

Oh, I do not want to know personally about the superiority of pypy vs the jvm. I was just suggesting a talking point; basically, show others that pypy is a better alternative(for dynamic languages, possibly all languages with naive concurrency working!) then llvm, jvm, etc... I do have a question though, would you suppose that performance of pypy-stm would be better than that of something like the approach clojure has? I have heard that immutable data structures are nice for correctness but that they are bad for performance.

Anonymous said...

So PyPy-STM is Python without GIL? And it's possible to make it only 20% slower than "regular" PyPy? That would be quite an achievement.

Could you publish a build of PyPy-STM for Debian Stable?

Armin Rigo said...

The PyPy-STM we have so far doesn't include any JIT. If you want to try it out anyway on other Linux platforms than Ubuntu, you need to translate it yourself, or possibly hack around with symlinks and LD_LIBRARY_PATH.

Anonymous said...

> The PyPy-STM we have so far doesn't include any JIT

Yep, that's what blog post said :) But also PyPy-STM doesn't include GIL, does it?

Armin Rigo said...

Indeed, which is the point :-) You're welcome to try it out, but I'm just saying that I don't want to go to great lengths to provide precompiled binaries that work on Linux XYZ when I could basically release an updated version every couple of days... It's still experimental and in-progress. Early versions are limited to two cores; later versions to 4 cores. We still have to determine the optimal number for this limit; maybe around 8? (higher numbers imply a bit of extra overheads) It's an example of in-progress work. Another example is that so far you don't get feedback from cross-transaction conflicts; you used to in previous versions, but we didn't port it yet.