PyPy-STM 2.5.1 - Mawhrin-Skel
We're pleased to announce PyPy-STM 2.5.1, codenamed Mawhrin-Skel. This is the second official release of PyPy-STM. You can download this release here (64-bit Linux only):
http://pypy.org/download.html
Documentation:
http://pypy.readthedocs.org/en/latest/stm.html
PyPy is an implementation of the Python programming language which focuses on performance. So far we've been relentlessly optimizing for the single core/process scenario. PyPy STM brings to the table a version of PyPy that does not have the infamous Global Interpreter Lock, hence can run multiple threads on multiple cores. Additionally it comes with a set of primitives that make writing multithreaded applications a lot easier, as explained below (see TransactionQueue) and in the documentation.
Internally, PyPy-STM is based on the Software Transactional Memory plug-in called stmgc-c7. This version comes with a relatively reasonable single-core overhead but scales only up to around 4 cores on some examples; the next version of the plug-in, stmgc-c8, is in development and should address that limitation (as well as reduce the overhead). These versions only support 64-bit Linux; we'd welcome someone to port the upcoming stmgc-c8 to other (64-bit) platforms.
This release passes all regular PyPy tests, except for a few special cases. In other words, you should be able to drop in PyPy-STM instead of the regular PyPy and your program should still work. See current status for more information.
This work was done by Remi Meier and Armin Rigo. Thanks to all donors for crowd-funding the STM work so far! As usual, it took longer than we would have thought. I really want to thank the people that kept making donations anyway. Your trust is greatly appreciated!
What's new?
Compared to the July 2014 release, the main addition is a way to get reports about STM conflicts. This is an essential new feature.
To understand why this is so important, consider that if you already played around with the previous release, chances are that you didn't get very far. It probably felt like a toy: on very small examples it would nicely scale, but on any larger example it would not scale at all. You didn't get any feedback about why, but the underlying reason is that, in a typical large example, there are some STM conflicts that occur all the time and that won't be immediately found just by thinking. This prevents any parallelization.
Now PyPy-STM is no longer a black box: you have a way to learn about these conflicts, fix them, and try again. The tl;dr version is to run:
PYPYSTM=stmlog ./pypy-stm example.py ./print_stm_log.py stmlog
More details in the STM user guide.
Performance
The performance is now more stable than it used to be. More precisely, the best case is still "25%-40% single-core slow-down with very good scaling up to 4 threads", but the average performance seems not too far from that. There are still dark spots --- notably, the JIT is still slower to warm up, though it was improved a lot. These are documented in the current status section. Apart from that, we should not get more than 2x single-core slow-down in the worst case. Please report such cases as bugs!
TransactionQueue
As explained before, PyPy-STM is more than "just" a Python without GIL. It is a Python in which you can do minor tweaks to your existing, non-multithreaded programs and get them to use multiple cores. You identify medium- or large-sized, likely-independent parts of the code and to ask PyPy-STM to run these parts in parallel. An example would be every iteration of some outermost loop over all items of a dictionary. This is done with a new API: transaction.TransactionQueue(). See help(TransactionQueue) or read more about it in the STM user guide.
This is not a 100% mechanical change: very likely, you need to hunt for and fix "STM conflicts" that prevent parallel execution (see docs). However, at all points your program runs correctly, and you can stop the hunt when you get acceptable performance. You don't get deadlocks or corrupted state.
Thanks for reading!
Armin, Remi, Fijal
From your explanation in this post, STM sounds similar to OpenMP. Can you explain the differences?
ReplyDelete→ http://openmp.org/wp/openmp-specifications/
This is explained in http://pypy.readthedocs.org/en/latest/stm.html#how-to-write-multithreaded-programs-the-10-000-feet-view
ReplyDeleteNice - thanks!
ReplyDelete»TransactionQueue is in part similar: your program needs to have “some chances” of parallelization before you can apply it. But I believe that the scope of applicability is much larger with TransactionQueue than with other approaches. It usually works without forcing a complete reorganization of your existing code, and it works on any Python program which has got latent and imperfect parallelism. Ideally, it only requires that the end programmer identifies where this parallelism is likely to be found«
If I understand that correctly, for STM the parallelism only needs to be likely and can be imperfect, because it can recover from errors.
This would fix a whole class of problems I experienced in OpenMP Fortran code: Turning a crash or (worse) undefined behavior into a mere performance loss - and that’s really cool!
Thank you for working on that!
Why do you always ask for money if nothing actually works?
ReplyDeletethe alternative is to ask for money for stuff that already works, and that's a terrible strategy. suggest better alternatives
ReplyDeleteYour comment suggests PyPy-STM doesn't actually work for you. If you have found a bug, please contribute a bug report, even if only if you have an example of program that should parallelize and doesn't; such bug reports are very useful. Alternatively, you're complaining that PyPy-STM is useless for you. Maybe I've been bad at explaining what you should expect and not expect from it in the first place, so I've given you wrong expectations. In that case, sorry. (The 3rd alternative would be that you're just trolling, but let's discard it for now.)
ReplyDelete