Sunday, June 10, 2012

STM with threads

Hi all,

A quick update. The first version of pypy-stm based on regular
threads
is ready. Still having no JIT and a 4-or-5-times performance
hit, it is not particularly fast, but I am happy that it turns out not
to be much slower than the previous thread-less attempts. It is at
least fast enough to run faster (in real time) than an equivalent no-STM
PyPy, if fed with an eight-threaded program on an eight-core machine
(provided, of course, you don't mind it eating all 8 cores' CPU power
instead of just one :-).

You can download and play around with this binary for Linux 64. It
was made from the stm-thread branch of the PyPy repository (translate.py --stm -O2 targetpypystandalone.py). (Be sure
to put it where it can find its stdlib, e.g. by putting it inside the
directory from the official 1.9 release.)

This binary supports the thread module and runs without the GIL.
So, despite the factor-of-4 slow-down issue, it should be the fourth
complete Python interpreter in which we can reasonably claim to have
resolved the problem of the GIL. (The first one was Greg Stein's Python
1.4, re-explored here; the second one is Jython; the third one is
IronPython.) Unlike the previous three, it is also the first one to
offer full GIL semantics to the programmer, and additionally
thread.atomic (see below). I should also add that we're likely to
see in the next year a 5th such interpreter, too, based on Hardware
Transactional Memory (same approach as with STM, but using e.g.
Intel's HTM).

The binary I linked to above supports all built-in modules from PyPy,
apart from signal, still being worked on (which can be a bit
annoying because standard library modules like subprocess depend on
it). The sys.get/setcheckinterval() functions can be used to tweak
the frequency of the automatic commits. Additionally, it offers
thread.atomic, described in the previous blog post as a way to
create longer atomic sections (with the observable effect of preventing
the "GIL" to be released during that time). A complete
transaction.py module based on it is available from the sources.

The main missing features are:

  • the signal module;
  • the Garbage Collector, which does not do major collections so far, only
    minor ones;
  • and finally, the JIT, which needs some amount of integration to generate
    the correctly-tweaked assembler.

Have fun!

Armin.

5 comments:

Anonymous said...

STM has such much potential. I wonder if it gets the attention of the hacker community it deserves. And if not, why not? I hope this is getting more recognition in the future.

Paul Jaros said...

Ah... didn't mean to post it anonymously.

Armin Rigo said...

@Paul: my guess would be that the majority of people that know STM are still looking at it from the point of view of short or very short transactions, as a replacement of locking. Even gcc 4.7 got an STM extension, but it cannot be used with long-running transactions: the performance is not at all tuned for this case, and you cannot express things you need in real long-running transactions, like interrupting them for I/O.

Moreover the single-core 4x performance hit is usually far more that what people are willing to accept --- not realizing that in many cases it will soon be outdated, as a way of measuring performance: the future is toward many-cores machines.

Anonymous said...

For a casual Python programmer like me, how does STM affect the way I write my programs? I know about suggested benefits of STM on multi-core machines. However, what I'm asking is what is it that I have to do differently to get that benefit ?

Thanks

Armin Rigo said...

@Anonymous: https://bitbucket.org/pypy/pypy/raw/stm-thread/pypy/doc/stm.rst