Wednesday, May 28, 2008

Threads and GCs

Hi all,

We can now compile a pypy-c that includes both thread support and one of our semi-advanced garbage collectors. This means that threaded Python programs can now run not only with a better performance, but without the annoyances of the Boehm garbage collector. (For example, Boehm doesn't like too much seeing large numbers of __del__(), and our implementation of ctypes uses them everywhere.)

Magic translation command (example):

   translate.py --thread --gc=hybrid targetpypystandalone --faassen --allworkingmodules

Note that multithreading in PyPy is based on a global interpreter lock, as in CPython. I imagine that we will get rid of the global interpreter lock at some point in the future -- I can certainly see how this might be done in PyPy, unlike in CPython -- but it will be a lot of work nevertheless. Given our current priorities, it will probably not occur soon unless someone steps in.

10 comments:

Anonymous said...

How could GIL be removed from PyPy?

Armin Rigo said...

By using fine-grained locking: locking every dictionary and list while it is used. This is what Jython does (or more precisely, what Jython asks the JVM to do for it). This certainly comes with a performance penalty, so it would only pay off if you actually have and can use multiple CPUs -- which is fine in PyPy: you would just translate different pypy-c's depending on the use case.

This would be a pain to implement in CPython, in particular because of refcounting. Even if the Py_INCREF and Py_DECREF macros were made thread-safe, all C-level APIs that manipulate borrowed references might have to be redesigned.

Anonymous said...

Pyprocessing may serve multi-core cpu needs for the time being, as it's an almost drop-in replacement for the threading module.

I think it uses ctypes, so it should work with pypy.

Maciej Fijalkowski said...

pyprocessing has it's own problems (not that threads has no problems at all :)

1. Memory usage, you need basically n times more memory when n is number of processes

2. you cannot pass arbitrary data between processes, just stuff that you can marshal/pickle which is a bit huge limitation.

3. on the other hand, multiple processes provides you better control, although not via threading drop-in replacement.

Cheers,
fijal

Anonymous said...

The live demos seem to be down... :(

Maciej Fijalkowski said...

Back online. Our test server is down as well, which makes it a bit hard to know stuff :(

Connelly Barnes said...

In response to maciej, OSes that implement copy-on-write fork (Linux, but not Windows, unsure about Mac OS X), don't take n times more memory. Fine-grained locking and an OpenMP-like syntax would be potentially useful. Maybe you could get a student to prototype these for you. But I'm sure someone will find a way to parallelize Python eventually, or we'll all switch to some other language, as the number of cores goes to infinity.

Connelly Barnes said...

In my previous comment, I was partly wrong: COW reduces memory usage, however, in CPython the refcounting will cause the interpreter to write to every area of memory, so the reduction may not be that significant. Also, IronPython supports fine-grained locks.

nekto0n said...

Would it be better to lock not whole mutable object but just an element or slice(for lists) and not lock object for reading operations?
It's a common method used in DBMS. A small and fast realisation(if it's possible to create) in PyPy whould be great =)

Rushen Aly said...

Is there any calendar date for removal of GIL? or is it just a wish. Secondly, what is your speed aim compared with Java?
Thanks...
Rushen