Sunday, August 18, 2013

Update on STM

Hi all,

A quick update on Software Transactional Memory. We are working on two fronts.

On the one hand, the integration of the "c4" C library with PyPy is done and works well, but is still subject to improvements. The "PyPy-STM" executable (without the JIT) seems to be stable, as far as it has been tested. It runs a simple benchmark like Richards with a 3.2x slow-down over a regular JIT-less PyPy.

The main factor of this slow-down: the numerous "barriers" in the code --- checks that are needed a bit everywhere to verify that a pointer to an object points to a recent enough version, and if not, to go to the most recent version. These barriers are inserted automatically during the translation; there is no need for us to manually put 42 million barriers in the source code of PyPy. But this automatic insertion uses a primitive algorithm right now, which usually ends up putting more barriers than the theoretical optimum. I (Armin) am trying to improve that --- and progressing: last week the slow-down was around 4.5x. This is done in the branch stmgc-static-barrier.

On the other hand, Remi is progressing on the JIT integration in the branch stmgc-c4. This has been working in simple cases since a couple of weeks by now, but the resulting "PyPy-JIT-STM" often crashes. This is because while the basics are not really hard, we keep hitting new issues that must be resolved.

The basics are that whenever the JIT is about to generate assembler corresponding to a load or a store in a GC object, it must first generate a bit of extra assembler that corresponds to the barrier that we need. This works fine by now (but could benefit from the same kind of optimizations described above, to reduce the number of barriers). The additional issues are all more subtle. I will describe the current one as an example: it is how to write constant pointers inside the assembler.

Remember that the STM library classifies objects as either "public" or "protected/private". A "protected/private" object is one which has not been seen by another thread so far. This is essential as an optimization, because we know that no other thread will access our protected or private objects in parallel, and thus we are free to modify their content in place. By contrast, public objects are frozen, and to do any change, we first need to build a different (protected) copy of the object. See this blog post for more details.

So far so good, but the JIT will sometimes (actually often) hard-code constant pointers into the assembler it produces. For example, this is the case when the Python code being JITted creates an instance of a known class; the corresponding assembler produced by the JIT will reserve the memory for the instance and then write the constant type pointer in it. This type pointer is a GC object (in the simple model, it's the Python class object; in PyPy it's actually the "map" object, which is a different story).

The problem right now is that this constant pointer may point to a protected object. This is a problem because the same piece of assembler can later be executed by a different thread. If it does, then this different thread will create instances whose type pointer is bogus: looking like a protected object, but actually protected by a different thread. Any attempt to use this type pointer to change anything on the class itself will likely crash: the threads will all think they can safely change it in-place. To fix this, we need to make sure we only write pointers to public objects in the assembler. This is a bit involved because we need to ensure that there is a public version of the object to start with.

When this is done, we will likely hit the next problem, and the next one; but at some point it should converge (hopefully!) and we'll give you our first PyPy-JIT-STM ready to try. Stay tuned :-)

A bientôt,

Armin.

Hi all,

A quick update on Software Transactional Memory. We are working on two fronts.

When this is done, we will likely hit the next problem, and the next one; but at some point it should converge (hopefully!) and we'll give you our first PyPy-JIT-STM ready to try. Stay tuned :-)

A bientôt,

Armin.

Posted by Armin Rigo at 19:54 8 Comments

Thursday, August 8, 2013

NumPyPy Status Update

Hello everyone

As expected, nditer is a lot of work. I'm going to pause my work on it for now and focus on simpler and more important things, here is a list of what I implemented :

Fixed a bug on 32 bit that made int32(123).dtype == dtype("int32") fail
Fixed a bug on the pickling of array slices
The external loop flag is implemented on the nditer class
The c_index, f_index and multi_index flags are also implemented
Add dtype("double") and dtype("str")
C-style iteration is available for nditer

Cheers
Romain Guillebert

Posted by Romain Guillebert at 20:01 2 Comments

Thursday, August 1, 2013

PyPy 2.1 - Considered ARMful

We're pleased to announce PyPy 2.1, which targets version 2.7.3 of the Python
language. This is the first release with official support for ARM processors in the JIT.
This release also contains several bugfixes and performance improvements.

You can download the PyPy 2.1 release here:

We would like to thank the Raspberry Pi Foundation for supporting the work
to finish PyPy's ARM support.

The first beta of PyPy3 2.1, targeting version 3 of the Python language, was
just released, more details can be found here.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.1 and cpython 2.7.2 performance comparison) due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. This release also supports ARM machines running Linux 32bit - anything with ARMv6 (like the Raspberry Pi) or ARMv7 (like the Beagleboard, Chromebook, Cubieboard, etc.) that supports VFPv3 should work. Both hard-float armhf/gnueabihf and soft-float armel/gnueabi builds are provided. The armhf builds for Raspbian are created using the Raspberry Pi custom cross-compilation toolchain based on gcc-arm-linux-gnueabihf and should work on ARMv6 and ARMv7 devices running Debian or Raspbian. The armel builds are built using the gcc-arm-linux-gnuebi toolchain provided by Ubuntu and currently target ARMv7.

Windows 64 work is still stalling, we would welcome a volunteer to handle that.

Highlights

JIT support for ARM, architecture versions 6 and 7, hard- and soft-float ABI
Stacklet support for ARM
Support for os.statvfs and os.fstatvfs on unix systems
Improved logging performance
Faster sets for objects
Interpreter improvements
During packaging, compile the CFFI based TK extension
Pickling of numpy arrays and dtypes
Subarrays for numpy
Bugfixes to numpy
Bugfixes to cffi and ctypes
Bugfixes to the x86 stacklet support
Fixed issue 1533: fix an RPython-level OverflowError for space.float_w(w_big_long_number).
Fixed issue 1552: GreenletExit should inherit from BaseException.
Fixed issue 1537: numpypy __array_interface__
Fixed issue 1238: Writing to an SSL socket in PyPy sometimes failed with a "bad write retry" message.

Cheers,

David Schneider for the PyPy team.

You can download the PyPy 2.1 release here:

We would like to thank the Raspberry Pi Foundation for supporting the work
to finish PyPy's ARM support.

The first beta of PyPy3 2.1, targeting version 3 of the Python language, was
just released, more details can be found here.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.1 and cpython 2.7.2 performance comparison) due to its integrated tracing JIT compiler.

Windows 64 work is still stalling, we would welcome a volunteer to handle that.

Highlights

JIT support for ARM, architecture versions 6 and 7, hard- and soft-float ABI
Stacklet support for ARM
Support for os.statvfs and os.fstatvfs on unix systems
Improved logging performance
Faster sets for objects
Interpreter improvements
During packaging, compile the CFFI based TK extension
Pickling of numpy arrays and dtypes
Subarrays for numpy
Bugfixes to numpy
Bugfixes to cffi and ctypes
Bugfixes to the x86 stacklet support
Fixed issue 1533: fix an RPython-level OverflowError for space.float_w(w_big_long_number).
Fixed issue 1552: GreenletExit should inherit from BaseException.
Fixed issue 1537: numpypy __array_interface__
Fixed issue 1238: Writing to an SSL socket in PyPy sometimes failed with a "bad write retry" message.

Cheers,

David Schneider for the PyPy team.

Posted by David Schneider at 15:38 9 Comments

Wednesday, July 31, 2013

PyPy Demo Evening in London, August 27, 2013

As promised in the London sprint announcement we are organising a PyPy demo evening during the London sprint on Tuesday, August 27 2013, 18:30-19:30 (BST). The description of the event is below. If you want to come, please register on the Eventbrite page.

PyPy is a fast Python VM. Maybe you've never used PyPy and want to find out what use it might be for you? Or you and your organisation have been using it and you want to find out more about how it works under the hood? If so, this demo session is for you!

Members of the PyPy team will give a series of lightning talks on PyPy: its benefits; how it works; research currently being undertaken to make it faster; and unusual uses it can be put to. Speakers will be available afterwards for informal discussions. This is the first time an event like this has been held in the UK, and is a unique opportunity to speak to core people. Speakers confirmed thus far include: Armin Rigo, Maciej Fijałkowski, Carl Friedrich Bolz, Lukas Diekmann, Laurence Tratt, Edd Barrett.

The venue for this talk is the Software Development Team, King's College London. The main entrance is on the Strand, from where the room for the event will be clearly signposted. Travel directions can be found at http://www.kcl.ac.uk/campuslife/campuses/directions/strand.aspx

If you have any questions about the event, please contact Laurence Tratt

Posted by Carl Friedrich Bolz-Tereick at 10:10 0 Comments

Tuesday, July 30, 2013

PyPy3 2.1 beta 1

We're pleased to announce the first beta of the upcoming 2.1 release of
PyPy3. This is the first release of PyPy which targets Python 3 (3.2.3)
compatibility.

We would like to thank all of the people who donated to the py3k proposal
for supporting the work that went into this and future releases.

You can download the PyPy3 2.1 beta 1 release here:

Highlights

The first release of PyPy3: support for Python 3, targetting CPython 3.2.3!
- There are some known issues including performance regressions (issues
  #1540 & #1541) slated to be resolved before the final release.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7.3 or 3.2.3. It's fast due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows
32. Also this release supports ARM machines running Linux 32bit - anything with
ARMv6 (like the Raspberry Pi) or ARMv7 (like Beagleboard,
Chromebook, Cubieboard, etc.) that supports VFPv3 should work.

Windows 64 work is still stalling and we would welcome a volunteer to handle
that.

How to use PyPy?

We suggest using PyPy from a virtualenv. Once you have a virtualenv
installed, you can follow instructions from pypy documentation on how
to proceed. This document also covers other installation schemes.

Cheers,
the PyPy team

Posted by Philip Jenvey at 22:35 3 Comments

Friday, July 26, 2013

PyPy 2.1 beta 2

We're pleased to announce the second beta of the upcoming 2.1 release of PyPy.
This beta adds one new feature to the 2.1 release and contains several bugfixes listed below.

You can download the PyPy 2.1 beta 2 release here:

Highlights

Support for os.statvfs and os.fstatvfs on unix systems.
Fixed issue 1533: fix an RPython-level OverflowError for space.float_w(w_big_long_number).
Fixed issue 1552: GreenletExit should inherit from BaseException.
Fixed issue 1537: numpypy __array_interface__
Fixed issue 1238: Writing to an SSL socket in pypy sometimes failed with a "bad write retry" message.
distutils: copy CPython's implementation of customize_compiler, dont call
split on environment variables, honour CFLAGS, CPPFLAGS, LDSHARED and
LDFLAGS.
During packaging, compile the CFFI tk extension.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7.3. It's fast due to its integrated tracing JIT compiler.

Windows 64 work is still stalling, we would welcome a volunteer
to handle that.

How to use PyPy?

Cheers,
The PyPy Team.

Posted by David Schneider at 11:33 0 Comments

PyPy San Francisco Sprint July 27th 2013

The next PyPy sprint will be in San Francisco, California. It is a public
sprint, suitable for newcomers. It will run on Saturday July 27th.

Some possible things people will be hacking on the sprint:

running your software on PyPy
making your software fast on PyPy
improving PyPy's JIT
improving Twisted on PyPy
any exciting stuff you can think of

If there are newcomers, we'll run an introduction to hacking on PyPy.

Location
The sprint will be held at the Rackspace Office:

620 Folsom St, Ste 100

The doors will open at 10AM and run until 6PM.

Posted by Alex at 02:17 2 Comments

Friday, July 19, 2013

PyPy London Sprint (August 26 - September 1 2013)

The next PyPy sprint will be in London, United Kingdom for the first time. This is a fully public sprint. PyPy sprints are a very good way to get into PyPy development and no prior PyPy knowledge is necessary.

Goals and topics of the sprint

For newcomers:

bring your application/library and we'll help you port it to PyPy, benchmark and profile
come and write your favorite missing numpy function
help us work on developer tools like jitviewer

We'll also work on:

refactoring the JIT optimizations
STM and STM-related topics
anything else attendees are interested in

Exact times

The work days should be August 26 - September 1 2013 (Monday-Sunday). The official plans are for people to arrive on the 26th, and to leave on the 2nd. There will be a break day in the middle. We'll typically start at 10:00 in the morning.

Location

The sprint will happen within a room of King's College's Strand Campus in Central London, UK. There are some travel instructions how to get there. We are being hosted by Laurence Tratt and the Software Development Team.

Demo Session

If you don't want to come to the full sprint, but still want to chat a bit, we are planning to have a demo session on Tuesday August 27. We will announce this separately on the blog. If you are interested, please leave a comment.

Registration

If you want to attend, please register by adding yourself to the "people.txt" file in Mercurial:

https://bitbucket.org/pypy/extradoc/
https://bitbucket.org/pypy/extradoc/raw/extradoc/sprintinfo/london-2013

or on the pypy-dev mailing list if you do not yet have check-in rights:

http://mail.python.org/mailman/listinfo/pypy-dev

Remember that you may need a (insert country here)-to-UK power adapter. Please note that UK is not within the Schengen zone, so non-EU and non-Switzerland citizens may require specific visa. Please check travel regulations. Also, the UK uses pound sterling (GBP).

Goals and topics of the sprint

For newcomers:

bring your application/library and we'll help you port it to PyPy, benchmark and profile
come and write your favorite missing numpy function
help us work on developer tools like jitviewer

We'll also work on:

refactoring the JIT optimizations
STM and STM-related topics
anything else attendees are interested in

Exact times

Location

Demo Session

Registration

If you want to attend, please register by adding yourself to the "people.txt" file in Mercurial:

https://bitbucket.org/pypy/extradoc/
https://bitbucket.org/pypy/extradoc/raw/extradoc/sprintinfo/london-2013

or on the pypy-dev mailing list if you do not yet have check-in rights:

http://mail.python.org/mailman/listinfo/pypy-dev

Posted by Carl Friedrich Bolz-Tereick at 15:58 6 Comments

Friday, July 12, 2013

Software Transactional Memory lisp experiments

As covered in the previous blog post, the STM subproject of PyPy has been back on the drawing board. The result of this experiment is an STM-aware garbage collector written in C. This is finished by now, thanks to Armin's and Remi's work, we have a fully functional garbage collector and a STM system that can be used from any C program with enough effort. Using it is more than a little mundane, since you have to inserts write and read barriers by hand everywhere in your code that reads or writes to garbage collector controlled memory. In the PyPy integration, this manual work is done automatically by the STM transformation in the interpreter.

However, to experiment some more, we created a minimal lisp-like/scheme-like interpreter (called Duhton), that follows closely CPython's implementation strategy. For anyone familiar with CPython's source code, it should be pretty readable. This interpreter works like a normal and very basic lisp variant, however it comes with a transaction builtin, that lets you spawn transactions using the STM system. We implemented a few demos that let you play with the transaction system. All the demos are running without conflicts, which means there are no conflicting writes to global memory and hence the demos are very amenable to parallelization. They exercise:

arithmetics - demo/many_sqare_roots.duh
read-only access to globals - demo/trees.duh
read-write access to local objects - demo/trees2.duh

With the latter ones being very similar to the classic gcbench. STM-aware Duhton can be found in the stmgc repo, while the STM-less Duhton, that uses refcounting, can be found in the duhton repo under the base branch.

Below are some benchmarks. Note that this is a little comparing apples to oranges since the single-threaded duhton uses refcounting GC vs generational GC for STM version. Future pypy benchmarks will compare more apples to apples. Moreover none of the benchmarks has any conflicts. Time is the total time that the benchmark took (not the CPU time) and there was very little variation in the consecutive runs (definitely below 5%).

benchmark	1 thread (refcount)	1 thread (stm)	2 threads	4 threads
square	1.9s	3.5s	1.8s	0.9s
trees	0.6s	1.0s	0.54s	0.28s
trees2	1.4s	2.2s	1.1s	0.57s

As you can see, the slowdown for STM vs single thread is significant (1.8x, 1.7x, 1.6x respectively), but still lower than 2x. However the speedup from running on multiple threads parallelizes the problem almost perfectly.

While a significant milestone, we hope the next blog post will cover STM-enabled pypy that's fully working with JIT work ongoing.

Cheers,
fijal on behalf of Remi Meier and Armin Rigo

arithmetics - demo/many_sqare_roots.duh
read-only access to globals - demo/trees.duh
read-write access to local objects - demo/trees2.duh

benchmark	1 thread (refcount)	1 thread (stm)	2 threads	4 threads
square	1.9s	3.5s	1.8s	0.9s
trees	0.6s	1.0s	0.54s	0.28s
trees2	1.4s	2.2s	1.1s	0.57s

While a significant milestone, we hope the next blog post will cover STM-enabled pypy that's fully working with JIT work ongoing.

Cheers,
fijal on behalf of Remi Meier and Armin Rigo

Posted by Maciej Fijalkowski at 11:07 7 Comments

Thursday, July 11, 2013

PyPy 2.1 beta

We're pleased to announce the first beta of the upcoming 2.1 release of PyPy. This beta contains many bugfixes and improvements, numerous improvements to the numpy in pypy effort. The main feature being that the ARM processor support is not longer considered alpha level.

We would like to thank the Raspberry Pi Foundation for supporting the work to finish PyPy's ARM support.

You can download the PyPy 2.1 beta release here:

Highlights

Bugfixes to the ARM JIT backend, so that ARM is now an officially
supported processor architecture
Stacklet support on ARM
Interpreter improvements
Various numpy improvements
Bugfixes to cffi and ctypes
Bugfixes to the stacklet support
Improved logging performance
Faster sets for objects

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.3. It's fast due to its integrated tracing JIT compiler. This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. Also this release supports ARM machines running Linux 32bit - anything with ARMv6 (like the Raspberry Pi) or ARMv7 (like Beagleboard, Chromebook, Cubieboard, etc.) that supports VFPv3 should work. Both hard-float armhf/gnueabihf and soft-float armel/gnueabi builds are provided. armhf builds for Raspbian are created using the Raspberry Pi
custom cross-compilation toolchain based on gcc-arm-linux-gnueabihf and should work on ARMv6 and ARMv7 devices running Debian or Raspbian. armel builds are built using the gcc-arm-linux-gnuebi toolchain provided by Ubuntu and currently target ARMv7.

Windows 64 work is still stalling, we would welcome a volunteer to handle that.

How to use PyPy?

We suggest using PyPy from a virtualenv. Once you have a virtualenv installed, you can follow instructions from pypy documentation on how to proceed. This document also covers other installation schemes.

Cheers,

the PyPy team.

Posted by David Schneider at 11:36 0 Comments

Thursday, July 4, 2013

EuroPython

Hi all,

A short note: if you're at EuroPython right now and wondering if PyPy is dead because you don't see the obviously expected talk about PyPy, don't worry. PyPy is still alive and kicking. The truth is two-fold: (1) we missed the talk deadline (duh!)... but as importantly, (2) for various reasons we chose not to travel to Florence this year after our trip to PyCon US. (Antonio Cuni is at Florence but doesn't have a talk about PyPy either.)

Armin

Posted by Armin Rigo at 19:46 1 Comments

Wednesday, June 12, 2013

Py3k status update #11

This is the 11th status update about our work on the py3k branch, which we
can work on thanks to all of the people who donated to the py3k proposal.

Here's some highlights of the progress made since the previous update:

PyPy py3k now matches CPython 3's hash code for
int/float/complex/Decimal/Fraction
Various outstanding unicode identifier related issues were
resolved. E.g. test_importlib/pep263/ucn/unicode all now fully pass. Various
usage of identifiers (in particular type and module names) have been fixed to
handle non-ascii names -- mostly around display of reprs and exception
messages.
The unicodedata database has been upgraded to 6.0.0.
Windows support has greatly improved, though it could still use some more
help (but so does the default branch to a certain degree).
Probably the last of the parsing related bugs/features have been taken care
of.
Of course various other smaller miscellaneous fixes

This leaves the branch w/ only about 5 outstanding failures of the stdlib test
suite:

test_float
1 failing test about containment of floats in collections.
test_memoryview
Various failures: requires some bytes/str changes among other things (Manuel
Jacob's has some progress on this on the py3k-memoryview branch)
test_multiprocessing
1 or more tests deadlock on some platforms
test_sys and test_threading
2 failing tests for the New GIL's new API

Probably the biggest feature left to tackle is the New GIL.

We're now pretty close to pushing an initial release. We had planned for one
around PyCon, but having missed that we've put some more effort into the branch
to provide a more fully-fledged initial release.

Thanks to the following for their contributions: Manuel Jacob, Amaury Forgeot
d'Arc, Karl Ramm, Jason Chu and Christian Hudon.

cheers,
Phil

Posted by Philip Jenvey at 20:17 8 Comments

Wednesday, June 5, 2013

STM on the drawing board

Hi all!

This is an update about the Software Transactional Memory subproject of PyPy. I have some good news of progress. Also, Remi Meier will likely help me this summer. He did various investigations with PyPy-STM for his Master's Thesis and contributed back a lot of ideas and some code. Welcome again Remi!

I am also sorry that it seems to advance so slowly. Beyond the usual excuses --- I was busy with other things, e.g. releasing PyPy 2.0 --- I would like to reassure people: I'm again working on it, and the financial contributions are still there and reserved for STM (almost half the money is left, a big thank you again if you contributed!).

The real reason for the apparent slowness, though, is that it is really a research project. It's possible to either have hard deadlines, or to follow various tracks and keep improving the basics, but not both at the same time.

During the past month where I have worked again on STM, I worked still on the second option; and I believe it was worth every second of it. Let me try to convince you :-)

The main blocker was that the STM subsystem, written in C, and the Garbage Collection (GC) subsystem, written in RPython, were getting harder and harder to coordinate. So what I did instead is to give up using RPython in favor of using only C for both. C is a good language for some things, which includes low-level programming where we must take care of delicate multithreading issues; RPython is not a good fit in that case, and wasn't designed to be.

I started a fresh Mercurial repo which is basically a stand-alone C library. This library (in heavy development right now!) gives any C program some functions to allocate and track GC-managed objects, and gives an actual STM+GC combination on these objects. It's possible (though rather verbose) to use it directly in C programs, like in a small example interpreter. Of course the eventual purpose is to link it with PyPy during translation to C, with all the verbose calls automatically generated.

Since I started this, bringing the GC closer to the STM, I kept finding new ways that the two might interact to improve the performance, maybe radically. Here is a summary of the current ideas.

When we run multiple threads, there are two common cases: one is to access (read and write) objects that have only been seen by the current thread; the other is to read objects seen by all threads, like in Python the modules/functions/classes, but not to write to them. Of course, writing to the same object from multiple threads occurs too, and it is handled correctly (that's the whole point), but it is a relatively rare case.

So each object is classified as "public" or "protected" (or "private", when they belong to the current transaction). Newly created objects, once they are no longer private, remain protected until they are read by a different thread. Now, the point is to use very different mechanisms for public and for protected objects. Public objects are visible by all threads, but read-only in memory; to change them, a copy must be made, and the changes are written to the copy (the "redolog" approach to STM). Protected objects, on the other hand, are modified in-place, with (if necessary) a copy of them being made for the sole purpose of a possible abort of the transaction (the "undolog" approach).

This is combined with a generational GC similar to PyPy's --- but here, each thread gets its own nursery and does its own "minor collections", independently of the others.

So objects are by default protected; when another thread tries to follow a pointer to them, then it is that other thread's job to carefully "steal" the object and turn it public (possibly making a copy of it if needed, e.g. if it was still a young object living in the original nursery).

The same object can exist temporarily in multiple versions: any number of public copies; at most one active protected copy; and optionally one private copy per thread (this is the copy as currently seen by the transaction in progress on that thread). The GC cleans up the unnecessary copies.

These ideas are variants and extensions of the same basic idea of keeping multiple copies with revision numbers to track them. Moreover, "read barriers" and "write barriers" are used by the C program calling into this library in order to be sure that it is accessing the right version of the object. In the currently investigated variant I believe it should be possible to have rather cheap read barriers, which would definitely be a major speed improvement over the previous variants. Actually, as far as I know, it would be a major improvement over most of the other existing STMs: in them, the typical read barrier involves following chains of pointers, and checking some dictionary to see if this thread has a modified local copy of the object. The difference with a read barrier that can resolve most cases in a few CPU cycles should be huge.

So, this is research :-) It is progressing, and at some point I'll be satisfied with it and stop rewriting everything; and then the actual integration into PyPy should be straightforward (there is already code to detect where the read and write barriers need to be inserted, where transactions can be split, etc.). Then there is support for the JIT to be written, and so on. But more about it later.

The purpose of this post was to give you some glimpses into what I'm working on right now. As usual, no plan for release yet. But you can look forward to seeing the C library progress. I'll probably also start soon some sample interpreter in C, to test the waters (likely a revival of duhton). If you know nothing about Python but all about the C-level multithreading issues, now is a good time to get involved :-)

Thanks for reading!

Armin

Hi all!

During the past month where I have worked again on STM, I worked still on the second option; and I believe it was worth every second of it. Let me try to convince you :-)

Since I started this, bringing the GC closer to the STM, I kept finding new ways that the two might interact to improve the performance, maybe radically. Here is a summary of the current ideas.

This is combined with a generational GC similar to PyPy's --- but here, each thread gets its own nursery and does its own "minor collections", independently of the others.

Thanks for reading!

Armin

Posted by Armin Rigo at 16:31 3 Comments

Monday, June 3, 2013

NumPyPy status update

Hello everyone,

May was the first month I was paid to work on NumPyPy (thanks to all who donated!), here is what I worked on during this period :

It is now possible to use subarrays.
It is now possible to pickle ndarrays (including those using subarrays), dtypes and scalars, the pickling protocol is the same as numpy's.

For June, I plan to work on the nditer class, it seems that there's enough work for an entire month.

Cheers
Romain Guillebert

Posted by Romain Guillebert at 15:09 7 Comments

Tuesday, May 21, 2013

PyPy 2.0.2 - Fermi Panini

We're pleased to announce PyPy 2.0.2. This is a stable bugfix release over 2.0 and 2.0.1. You can download it here:

It fixes a crash in the JIT when calling external C functions (with ctypes/cffi) in a multithreaded context.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.0 and cpython 2.7.3 performance comparison) due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. Support for ARM is progressing but not bug-free yet.

Highlights

This release contains only the fix described above. A crash (or wrong results) used to occur if all these conditions were true:

your program is multithreaded;
it runs on a single-core machine or a heavily-loaded multi-core one;
it uses ctypes or cffi to issue external calls to C functions.

This was fixed in the branch emit-call-x86 (see the example file bug1.py).

Cheers, arigo et. al. for the PyPy team

Posted by Armin Rigo at 17:18 1 Comments

Thursday, May 16, 2013

PyPy 2.0.1 - Bohr Smørrebrød

We're pleased to announce PyPy 2.0.1. This is a stable bugfix release over 2.0. You can download it here:

The fixes are mainly about fatal errors or crashes in our stdlib. See below for more details.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.0 and cpython 2.7.3 performance comparison) due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. Support for ARM is progressing but not bug-free yet.

Highlights

fix an occasional crash in the JIT that ends in RPython Fatal error: NotImplementedError.
id(x) is now always a positive number (except on int/float/long/complex). This fixes an issue in _sqlite.py (mostly for 32-bit Linux).
fix crashes of callback-from-C-functions (with cffi) when used together with Stackless features, on asmgcc (i.e. Linux only). Now gevent should work better.
work around an eventlet issue with socket._decref_socketios().

Cheers, arigo et. al. for the PyPy team

Posted by Armin Rigo at 19:10 0 Comments

Saturday, May 11, 2013

Numpy Status Update

Hello Everyone,

I've started to work on NumPyPy since the end of April and here is a short update :

I implemented pickling support on ndarrays and dtypes, it will be compatible with numpy's pickling protocol when the "numpypy" module will be renamed to "numpy".
I am now working on subarrays.

I would also like to thank everyone who donated and allowed me to work on this.

Cheers,

Romain Guillebert

Posted by Romain Guillebert at 18:19 6 Comments

Thursday, May 9, 2013

PyPy 2.0 - Einstein Sandwich

We're pleased to announce PyPy 2.0. This is a stable release that brings a swath of bugfixes, small performance improvements and compatibility fixes. PyPy 2.0 is a big step for us and we hope in the future we'll be able to provide stable releases more often.

You can download the PyPy 2.0 release here:

The two biggest changes since PyPy 1.9 are:

stackless is now supported including greenlets, which means eventlet and gevent should work (but read below about gevent)
PyPy now contains release 0.6 of cffi as a builtin module, which is preferred way of calling C from Python that works well on PyPy

If you're using PyPy for anything, it would help us immensely if you fill out the following survey: http://bit.ly/pypysurvey This is for the developers eyes and we will not make any information public without your agreement.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.0 and cpython 2.7.3 performance comparison) due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. Windows 64 work is still stalling, we would welcome a volunteer to handle that. ARM support is on the way, as you can see from the recently released alpha for ARM.

Highlights

Stackless including greenlets should work. For gevent, you need to check out pypycore and use the pypy-hacks branch of gevent.
cffi is now a module included with PyPy. (cffi also exists for CPython; the two versions should be fully compatible.) It is the preferred way of calling C from Python that works on PyPy.
Callbacks from C are now JITted, which means XML parsing is much faster.
A lot of speed improvements in various language corners, most of them small, but speeding up some particular corners a lot.
The JIT was refactored to emit machine code which manipulates a "frame" that lives on the heap rather than on the stack. This is what makes Stackless work, and it could bring another future speed-up (not done yet).
A lot of stability issues fixed.
Refactoring much of the numpypy array classes, which resulted in removal of lazy expression evaluation. On the other hand, we now have more complete dtype support and support more array attributes.

Cheers,
fijal, arigo and the PyPy team

You can download the PyPy 2.0 release here:

The two biggest changes since PyPy 1.9 are:

stackless is now supported including greenlets, which means eventlet and gevent should work (but read below about gevent)
PyPy now contains release 0.6 of cffi as a builtin module, which is preferred way of calling C from Python that works well on PyPy

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.0 and cpython 2.7.3 performance comparison) due to its integrated tracing JIT compiler.

Highlights

Stackless including greenlets should work. For gevent, you need to check out pypycore and use the pypy-hacks branch of gevent.
cffi is now a module included with PyPy. (cffi also exists for CPython; the two versions should be fully compatible.) It is the preferred way of calling C from Python that works on PyPy.
Callbacks from C are now JITted, which means XML parsing is much faster.
A lot of speed improvements in various language corners, most of them small, but speeding up some particular corners a lot.
The JIT was refactored to emit machine code which manipulates a "frame" that lives on the heap rather than on the stack. This is what makes Stackless work, and it could bring another future speed-up (not done yet).
A lot of stability issues fixed.
Refactoring much of the numpypy array classes, which resulted in removal of lazy expression evaluation. On the other hand, we now have more complete dtype support and support more array attributes.

Cheers,
fijal, arigo and the PyPy team

Posted by Maciej Fijalkowski at 20:37 7 Comments

Tuesday, May 7, 2013

PyPy 2.0 alpha for ARM

Hello.

We're pleased to announce an alpha release of PyPy 2.0 for ARM. This is mostly a technology preview, as we know the JIT is not yet stable enough for the full release. However please try your stuff on ARM and report back.

This is the first release that supports a range of ARM devices - anything with ARMv6 (like the Raspberry Pi) or ARMv7 (like Beagleboard, Chromebook, Cubieboard, etc.) that supports VFPv3 should work. We provide builds with support for both ARM EABI variants: hard-float and some older operating systems soft-float.

This release comes with a list of limitations, consider it alpha quality, not suitable for production:

stackless support is missing.
assembler produced is not always correct, but we successfully managed to run large parts of our extensive benchmark suite, so most stuff should work.

You can download the PyPy 2.0 alpha ARM release here (including a deb for raspbian):

Part of the work was sponsored by the Raspberry Pi foundation.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.3. It's fast due to its integrated tracing JIT compiler.

This release supports ARM machines running Linux 32bit. Both hard-float armhf and soft-float armel builds are provided. armhf builds are created using the Raspberry Pi custom cross-compilation toolchain based on gcc-arm-linux-gnueabihf and should work on ARMv6 and ARMv7 devices running at least debian or ubuntu. armel builds are built using gcc-arm-linux-gnuebi toolchain provided by ubuntu and currently target ARMv7. If there is interest in other builds, such as gnueabi for ARMv6 or without requiring a VFP let us know in the comments or in IRC.

Benchmarks

Everybody loves benchmarks. Here is a table of our benchmark suite (for ARM we don't provide it yet on http://speed.pypy.org, unfortunately).

This is a comparison of Cortex A9 processor with 4M cache and Xeon W3580 with 8M of L3 cache. The set of benchmarks is a subset of what we run for http://speed.pypy.org that finishes in reasonable time. The ARM machine was provided by Calxeda. Columns are respectively:

benchmark name
PyPy speedup over CPython on ARM (Cortex A9)
PyPy speedup over CPython on x86 (Xeon)
speedup on Xeon vs Cortex A9, as measured on CPython
speedup on Xeon vs Cortex A9, as measured on PyPy
relative speedup (how much bigger the x86 speedup is over ARM speedup)

Benchmark	PyPy vs CPython (arm)	PyPy vs CPython (x86)	x86 vs arm (pypy)	x86 vs arm (cpython)	relative speedup
ai	3.61	3.16	7.70	8.82	0.87
bm_mako	3.41	2.11	8.56	13.82	0.62
chaos	21.82	17.80	6.93	8.50	0.82
crypto_pyaes	22.53	19.48	6.53	7.56	0.86
django	13.43	11.16	7.90	9.51	0.83
eparse	1.43	1.17	6.61	8.12	0.81
fannkuch	6.22	5.36	6.18	7.16	0.86
float	5.22	6.00	9.68	8.43	1.15
go	4.72	3.34	5.91	8.37	0.71
hexiom2	8.70	7.00	7.69	9.56	0.80
html5lib	2.35	2.13	6.59	7.26	0.91
json_bench	1.12	0.93	7.19	8.68	0.83
meteor-contest	2.13	1.68	5.95	7.54	0.79
nbody_modified	8.19	7.78	6.08	6.40	0.95
pidigits	1.27	0.95	14.67	19.66	0.75
pyflate-fast	3.30	3.57	10.64	9.84	1.08
raytrace-simple	46.41	29.00	5.14	8.23	0.62
richards	31.48	28.51	6.95	7.68	0.91
slowspitfire	1.28	1.14	5.91	6.61	0.89
spambayes	1.93	1.27	4.15	6.30	0.66
sphinx	1.01	1.05	7.76	7.45	1.04
spitfire	1.55	1.58	5.62	5.49	1.02
spitfire_cstringio	9.61	5.74	5.43	9.09	0.60
sympy_expand	1.42	0.97	3.86	5.66	0.68
sympy_integrate	1.60	0.95	4.24	7.12	0.60
sympy_str	0.72	0.48	3.68	5.56	0.66
sympy_sum	1.99	1.19	3.83	6.38	0.60
telco	14.28	9.36	3.94	6.02	0.66
twisted_iteration	11.60	7.33	6.04	9.55	0.63
twisted_names	3.68	2.83	5.01	6.50	0.77
twisted_pb	4.94	3.02	5.10	8.34	0.61

It seems that Cortex A9, while significantly slower than Xeon, has higher slowdowns with a large interpreter (CPython) than a JIT compiler (PyPy). This comes as a surprise to me, especially that our ARM assembler is not nearly as polished as our x86 assembler. As for the causes, various people mentioned branch predictor, but I would not like to speculate without actually knowing.

How to use PyPy?

We suggest using PyPy from a virtualenv. Once you have a virtualenv installed, you can follow instructions from pypy documentation on how to proceed. This document also covers other installation schemes.

We would not recommend using in production PyPy on ARM just quite yet, however the day of a stable PyPy ARM release is not far off.

Cheers,
fijal, bivab, arigo and the whole PyPy team

Hello.

This release comes with a list of limitations, consider it alpha quality, not suitable for production:

stackless support is missing.
assembler produced is not always correct, but we successfully managed to run large parts of our extensive benchmark suite, so most stuff should work.

You can download the PyPy 2.0 alpha ARM release here (including a deb for raspbian):

Part of the work was sponsored by the Raspberry Pi foundation.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.3. It's fast due to its integrated tracing JIT compiler.

Benchmarks

Everybody loves benchmarks. Here is a table of our benchmark suite (for ARM we don't provide it yet on http://speed.pypy.org, unfortunately).

benchmark name
PyPy speedup over CPython on ARM (Cortex A9)
PyPy speedup over CPython on x86 (Xeon)
speedup on Xeon vs Cortex A9, as measured on CPython
speedup on Xeon vs Cortex A9, as measured on PyPy
relative speedup (how much bigger the x86 speedup is over ARM speedup)

Benchmark	PyPy vs CPython (arm)	PyPy vs CPython (x86)	x86 vs arm (pypy)	x86 vs arm (cpython)	relative speedup
ai	3.61	3.16	7.70	8.82	0.87
bm_mako	3.41	2.11	8.56	13.82	0.62
chaos	21.82	17.80	6.93	8.50	0.82
crypto_pyaes	22.53	19.48	6.53	7.56	0.86
django	13.43	11.16	7.90	9.51	0.83
eparse	1.43	1.17	6.61	8.12	0.81
fannkuch	6.22	5.36	6.18	7.16	0.86
float	5.22	6.00	9.68	8.43	1.15
go	4.72	3.34	5.91	8.37	0.71
hexiom2	8.70	7.00	7.69	9.56	0.80
html5lib	2.35	2.13	6.59	7.26	0.91
json_bench	1.12	0.93	7.19	8.68	0.83
meteor-contest	2.13	1.68	5.95	7.54	0.79
nbody_modified	8.19	7.78	6.08	6.40	0.95
pidigits	1.27	0.95	14.67	19.66	0.75
pyflate-fast	3.30	3.57	10.64	9.84	1.08
raytrace-simple	46.41	29.00	5.14	8.23	0.62
richards	31.48	28.51	6.95	7.68	0.91
slowspitfire	1.28	1.14	5.91	6.61	0.89
spambayes	1.93	1.27	4.15	6.30	0.66
sphinx	1.01	1.05	7.76	7.45	1.04
spitfire	1.55	1.58	5.62	5.49	1.02
spitfire_cstringio	9.61	5.74	5.43	9.09	0.60
sympy_expand	1.42	0.97	3.86	5.66	0.68
sympy_integrate	1.60	0.95	4.24	7.12	0.60
sympy_str	0.72	0.48	3.68	5.56	0.66
sympy_sum	1.99	1.19	3.83	6.38	0.60
telco	14.28	9.36	3.94	6.02	0.66
twisted_iteration	11.60	7.33	6.04	9.55	0.63
twisted_names	3.68	2.83	5.01	6.50	0.77
twisted_pb	4.94	3.02	5.10	8.34	0.61

How to use PyPy?

We suggest using PyPy from a virtualenv. Once you have a virtualenv installed, you can follow instructions from pypy documentation on how to proceed. This document also covers other installation schemes.

We would not recommend using in production PyPy on ARM just quite yet, however the day of a stable PyPy ARM release is not far off.

Cheers,
fijal, bivab, arigo and the whole PyPy team

Posted by Maciej Fijalkowski at 14:35 7 Comments

Sunday, April 7, 2013

PyPy 2.0 beta 2 released

We're pleased to announce the 2.0 beta 2 release of PyPy. This is a major release of PyPy and we're getting very close to 2.0 final, however it includes quite a few new features that require further testing. Please test and report issues, so we can have a rock-solid 2.0 final. It also includes a performance regression of about 5% compared to 2.0 beta 1 that we hope to fix before 2.0 final. The ARM support is not working yet and we're working hard to make it happen before the 2.0 final. The new major features are:

JIT now supports stackless features, that is greenlets and stacklets. This means that JIT can now optimize the code that switches the context. It enables running eventlet and gevent on PyPy (although gevent requires some special support that's not quite finished, read below).
This is the first PyPy release that includes cffi as a core library. Version 0.6 comes included in the PyPy library. cffi has seen a lot of adoption among library authors and we believe it's the best way to wrap C libaries. You can see examples of cffi usage in _curses.py and _sqlite3.py in the PyPy source code.

You can download the PyPy 2.0 beta 2 release here:

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.3. It's fast (pypy 2.0 beta 2 and cpython 2.7.3 performance comparison) due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. It also supports ARM machines running Linux, however this is disabled for the beta 2 release. Windows 64 work is still stalling, we would welcome a volunteer to handle that.

How to use PyPy?

We suggest using PyPy from a virtualenv. Once you have a virtualenv installed, you can follow instructions from pypy documentation on how to proceed. This document also covers other installation schemes.

Highlights

cffi is officially supported by PyPy. It comes included in the standard library, just use import cffi
stackless support - eventlet just works and gevent requires pypycore and pypy-hacks branch of gevent (which mostly disables cython-based modules)
callbacks from C are now much faster. pyexpat is about 3x faster, cffi callbacks around the same
__length_hint__ is implemented (PEP 424)
a lot of numpy improvements

Improvements since 1.9

JIT hooks are now a powerful tool to introspect the JITting process that PyPy performs
various performance improvements compared to 1.9 and 2.0 beta 1
operations on long objects are now as fast as in CPython (from roughly 2x slower)
we now have special strategies for dict/set/list which contain unicode strings, which means that now such collections will be both faster and more compact.

JIT now supports stackless features, that is greenlets and stacklets. This means that JIT can now optimize the code that switches the context. It enables running eventlet and gevent on PyPy (although gevent requires some special support that's not quite finished, read below).
This is the first PyPy release that includes cffi as a core library. Version 0.6 comes included in the PyPy library. cffi has seen a lot of adoption among library authors and we believe it's the best way to wrap C libaries. You can see examples of cffi usage in _curses.py and _sqlite3.py in the PyPy source code.

You can download the PyPy 2.0 beta 2 release here:

What is PyPy?

How to use PyPy?

We suggest using PyPy from a virtualenv. Once you have a virtualenv installed, you can follow instructions from pypy documentation on how to proceed. This document also covers other installation schemes.

Highlights

cffi is officially supported by PyPy. It comes included in the standard library, just use import cffi
stackless support - eventlet just works and gevent requires pypycore and pypy-hacks branch of gevent (which mostly disables cython-based modules)
callbacks from C are now much faster. pyexpat is about 3x faster, cffi callbacks around the same
__length_hint__ is implemented (PEP 424)
a lot of numpy improvements

Improvements since 1.9

JIT hooks are now a powerful tool to introspect the JITting process that PyPy performs
various performance improvements compared to 1.9 and 2.0 beta 1
operations on long objects are now as fast as in CPython (from roughly 2x slower)
we now have special strategies for dict/set/list which contain unicode strings, which means that now such collections will be both faster and more compact.

Posted by Maciej Fijalkowski at 10:19 8 Comments