The slides of the London demo evening are now online:
Friday, August 30, 2013
Slides of the PyPy London Demo Evening
The slides of the London demo evening are now online:
Tuesday, August 27, 2013
NumPy road forward
Hello everyone.
This is the roadmap for numpy effort in PyPy as discussed on the London sprint. First, the highest on our priority list is to finish the low-level part of the numpy module. What we'll do is to finish the RPython part of numpy and provide a pip installable numpypy repository that includes the pure python part of Numpy. This would contain the original Numpy with a few minor changes.
Second, we need to work on the JIT support that will make NumPy on PyPy faster. In detail:
- reenable the lazy loop evaluation
- optimize bridges, which is depending on optimizer refactorings
- SSE support
On the compatibility front, there were some independent attempts into making the following stuff working:
- f2py
- C API (in fact, PyArray_* API is partly present in the nightly builds of PyPy)
- matplotlib (both using PyArray_* API and embedding CPython runtime in PyPy)
- scipy
In order to make all of the above happen faster, it would be helpful to raise more funds. You can donate to PyPy's NumPy project on our website. Note that PyPy is a member of SFC which is a 501(c)(3) US non-profit, so donations from US companies can be tax-deducted.
Cheers,
fijal, arigo, ronan, rguillebert, anto and others
Hello everyone.
This is the roadmap for numpy effort in PyPy as discussed on the London sprint. First, the highest on our priority list is to finish the low-level part of the numpy module. What we'll do is to finish the RPython part of numpy and provide a pip installable numpypy repository that includes the pure python part of Numpy. This would contain the original Numpy with a few minor changes.
Second, we need to work on the JIT support that will make NumPy on PyPy faster. In detail:
- reenable the lazy loop evaluation
- optimize bridges, which is depending on optimizer refactorings
- SSE support
On the compatibility front, there were some independent attempts into making the following stuff working:
- f2py
- C API (in fact, PyArray_* API is partly present in the nightly builds of PyPy)
- matplotlib (both using PyArray_* API and embedding CPython runtime in PyPy)
- scipy
In order to make all of the above happen faster, it would be helpful to raise more funds. You can donate to PyPy's NumPy project on our website. Note that PyPy is a member of SFC which is a 501(c)(3) US non-profit, so donations from US companies can be tax-deducted.
Cheers,
fijal, arigo, ronan, rguillebert, anto and others
Tuesday, August 20, 2013
Preliminary London Demo Evening Agenda
We now have a preliminary agenda for the demo evening in London next week. It takes place on Tuesday, August 27 2013, 18:30-19:30 (BST) at King's College London, Strand. The preliminary agenda is as follows:
- Laurence Tratt: Welcome from the Software Development Team
- Carl Friedrich Bolz: A Short Introduction to PyPy
- Maciej Fijałkowski: Numpy on PyPy, Present State and Outlook
- Lukas Diekmann: Collection Strategies for Fast Containers in PyPy
- Armin Rigo: Software Transactional Memory for PyPy
- Edd Barrett: Unipycation: Combining Prolog and Python
All the talks are lightning talks. Afterwards there will be plenty of time for discussion.
There's still free spots, if you want to come, please register on the Eventbrite page. Hope to see you there!
We now have a preliminary agenda for the demo evening in London next week. It takes place on Tuesday, August 27 2013, 18:30-19:30 (BST) at King's College London, Strand. The preliminary agenda is as follows:
- Laurence Tratt: Welcome from the Software Development Team
- Carl Friedrich Bolz: A Short Introduction to PyPy
- Maciej Fijałkowski: Numpy on PyPy, Present State and Outlook
- Lukas Diekmann: Collection Strategies for Fast Containers in PyPy
- Armin Rigo: Software Transactional Memory for PyPy
- Edd Barrett: Unipycation: Combining Prolog and Python
All the talks are lightning talks. Afterwards there will be plenty of time for discussion.
There's still free spots, if you want to come, please register on the Eventbrite page. Hope to see you there!
Sunday, August 18, 2013
Update on STM
Hi all,
A quick update on Software Transactional Memory. We are working on two fronts.
On the one hand, the integration of the "c4" C library with PyPy is done and works well, but is still subject to improvements. The "PyPy-STM" executable (without the JIT) seems to be stable, as far as it has been tested. It runs a simple benchmark like Richards with a 3.2x slow-down over a regular JIT-less PyPy.
The main factor of this slow-down: the numerous "barriers" in the code --- checks that are needed a bit everywhere to verify that a pointer to an object points to a recent enough version, and if not, to go to the most recent version. These barriers are inserted automatically during the translation; there is no need for us to manually put 42 million barriers in the source code of PyPy. But this automatic insertion uses a primitive algorithm right now, which usually ends up putting more barriers than the theoretical optimum. I (Armin) am trying to improve that --- and progressing: last week the slow-down was around 4.5x. This is done in the branch stmgc-static-barrier.
On the other hand, Remi is progressing on the JIT integration in the branch stmgc-c4. This has been working in simple cases since a couple of weeks by now, but the resulting "PyPy-JIT-STM" often crashes. This is because while the basics are not really hard, we keep hitting new issues that must be resolved.
The basics are that whenever the JIT is about to generate assembler corresponding to a load or a store in a GC object, it must first generate a bit of extra assembler that corresponds to the barrier that we need. This works fine by now (but could benefit from the same kind of optimizations described above, to reduce the number of barriers). The additional issues are all more subtle. I will describe the current one as an example: it is how to write constant pointers inside the assembler.
Remember that the STM library classifies objects as either "public" or "protected/private". A "protected/private" object is one which has not been seen by another thread so far. This is essential as an optimization, because we know that no other thread will access our protected or private objects in parallel, and thus we are free to modify their content in place. By contrast, public objects are frozen, and to do any change, we first need to build a different (protected) copy of the object. See this blog post for more details.
So far so good, but the JIT will sometimes (actually often) hard-code constant pointers into the assembler it produces. For example, this is the case when the Python code being JITted creates an instance of a known class; the corresponding assembler produced by the JIT will reserve the memory for the instance and then write the constant type pointer in it. This type pointer is a GC object (in the simple model, it's the Python class object; in PyPy it's actually the "map" object, which is a different story).
The problem right now is that this constant pointer may point to a protected object. This is a problem because the same piece of assembler can later be executed by a different thread. If it does, then this different thread will create instances whose type pointer is bogus: looking like a protected object, but actually protected by a different thread. Any attempt to use this type pointer to change anything on the class itself will likely crash: the threads will all think they can safely change it in-place. To fix this, we need to make sure we only write pointers to public objects in the assembler. This is a bit involved because we need to ensure that there is a public version of the object to start with.
When this is done, we will likely hit the next problem, and the next one; but at some point it should converge (hopefully!) and we'll give you our first PyPy-JIT-STM ready to try. Stay tuned :-)
A bientôt,
Armin.
Hi all,
A quick update on Software Transactional Memory. We are working on two fronts.
On the one hand, the integration of the "c4" C library with PyPy is done and works well, but is still subject to improvements. The "PyPy-STM" executable (without the JIT) seems to be stable, as far as it has been tested. It runs a simple benchmark like Richards with a 3.2x slow-down over a regular JIT-less PyPy.
The main factor of this slow-down: the numerous "barriers" in the code --- checks that are needed a bit everywhere to verify that a pointer to an object points to a recent enough version, and if not, to go to the most recent version. These barriers are inserted automatically during the translation; there is no need for us to manually put 42 million barriers in the source code of PyPy. But this automatic insertion uses a primitive algorithm right now, which usually ends up putting more barriers than the theoretical optimum. I (Armin) am trying to improve that --- and progressing: last week the slow-down was around 4.5x. This is done in the branch stmgc-static-barrier.
On the other hand, Remi is progressing on the JIT integration in the branch stmgc-c4. This has been working in simple cases since a couple of weeks by now, but the resulting "PyPy-JIT-STM" often crashes. This is because while the basics are not really hard, we keep hitting new issues that must be resolved.
The basics are that whenever the JIT is about to generate assembler corresponding to a load or a store in a GC object, it must first generate a bit of extra assembler that corresponds to the barrier that we need. This works fine by now (but could benefit from the same kind of optimizations described above, to reduce the number of barriers). The additional issues are all more subtle. I will describe the current one as an example: it is how to write constant pointers inside the assembler.
Remember that the STM library classifies objects as either "public" or "protected/private". A "protected/private" object is one which has not been seen by another thread so far. This is essential as an optimization, because we know that no other thread will access our protected or private objects in parallel, and thus we are free to modify their content in place. By contrast, public objects are frozen, and to do any change, we first need to build a different (protected) copy of the object. See this blog post for more details.
So far so good, but the JIT will sometimes (actually often) hard-code constant pointers into the assembler it produces. For example, this is the case when the Python code being JITted creates an instance of a known class; the corresponding assembler produced by the JIT will reserve the memory for the instance and then write the constant type pointer in it. This type pointer is a GC object (in the simple model, it's the Python class object; in PyPy it's actually the "map" object, which is a different story).
The problem right now is that this constant pointer may point to a protected object. This is a problem because the same piece of assembler can later be executed by a different thread. If it does, then this different thread will create instances whose type pointer is bogus: looking like a protected object, but actually protected by a different thread. Any attempt to use this type pointer to change anything on the class itself will likely crash: the threads will all think they can safely change it in-place. To fix this, we need to make sure we only write pointers to public objects in the assembler. This is a bit involved because we need to ensure that there is a public version of the object to start with.
When this is done, we will likely hit the next problem, and the next one; but at some point it should converge (hopefully!) and we'll give you our first PyPy-JIT-STM ready to try. Stay tuned :-)
A bientôt,
Armin.
Thursday, August 8, 2013
NumPyPy Status Update
As expected, nditer is a lot of work. I'm going to pause my work on it for now and focus on simpler and more important things, here is a list of what I implemented :
- Fixed a bug on 32 bit that made int32(123).dtype == dtype("int32") fail
- Fixed a bug on the pickling of array slices
- The external loop flag is implemented on the nditer class
- The c_index, f_index and multi_index flags are also implemented
- Add dtype("double") and dtype("str")
- C-style iteration is available for nditer
Romain Guillebert
As expected, nditer is a lot of work. I'm going to pause my work on it for now and focus on simpler and more important things, here is a list of what I implemented :
- Fixed a bug on 32 bit that made int32(123).dtype == dtype("int32") fail
- Fixed a bug on the pickling of array slices
- The external loop flag is implemented on the nditer class
- The c_index, f_index and multi_index flags are also implemented
- Add dtype("double") and dtype("str")
- C-style iteration is available for nditer
Romain Guillebert
Thursday, August 1, 2013
PyPy 2.1 - Considered ARMful
We're pleased to announce PyPy 2.1, which targets version 2.7.3 of the Python
language. This is the first release with official support for ARM processors in the JIT.
This release also contains several bugfixes and performance improvements.
You can download the PyPy 2.1 release here:
http://pypy.org/download.html
We would like to thank the Raspberry Pi Foundation for supporting the work
to finish PyPy's ARM support.
The first beta of PyPy3 2.1, targeting version 3 of the Python language, was
just released, more details can be found here.
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.1 and cpython 2.7.2 performance comparison) due to its integrated tracing JIT compiler.
This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. This release also supports ARM machines running Linux 32bit - anything with ARMv6 (like the Raspberry Pi) or ARMv7 (like the Beagleboard, Chromebook, Cubieboard, etc.) that supports VFPv3 should work. Both hard-float armhf/gnueabihf and soft-float armel/gnueabi builds are provided. The armhf builds for Raspbian are created using the Raspberry Pi custom cross-compilation toolchain based on gcc-arm-linux-gnueabihf and should work on ARMv6 and ARMv7 devices running Debian or Raspbian. The armel builds are built using the gcc-arm-linux-gnuebi toolchain provided by Ubuntu and currently target ARMv7.
Windows 64 work is still stalling, we would welcome a volunteer to handle that.
Highlights
- JIT support for ARM, architecture versions 6 and 7, hard- and soft-float ABI
- Stacklet support for ARM
- Support for os.statvfs and os.fstatvfs on unix systems
- Improved logging performance
- Faster sets for objects
- Interpreter improvements
- During packaging, compile the CFFI based TK extension
- Pickling of numpy arrays and dtypes
- Subarrays for numpy
- Bugfixes to numpy
- Bugfixes to cffi and ctypes
- Bugfixes to the x86 stacklet support
- Fixed issue 1533: fix an RPython-level OverflowError for space.float_w(w_big_long_number).
- Fixed issue 1552: GreenletExit should inherit from BaseException.
- Fixed issue 1537: numpypy __array_interface__
- Fixed issue 1238: Writing to an SSL socket in PyPy sometimes failed with a "bad write retry" message.
Cheers,
David Schneider for the PyPy team.
We're pleased to announce PyPy 2.1, which targets version 2.7.3 of the Python
language. This is the first release with official support for ARM processors in the JIT.
This release also contains several bugfixes and performance improvements.
You can download the PyPy 2.1 release here:
http://pypy.org/download.html
We would like to thank the Raspberry Pi Foundation for supporting the work
to finish PyPy's ARM support.
The first beta of PyPy3 2.1, targeting version 3 of the Python language, was
just released, more details can be found here.
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.1 and cpython 2.7.2 performance comparison) due to its integrated tracing JIT compiler.
This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. This release also supports ARM machines running Linux 32bit - anything with ARMv6 (like the Raspberry Pi) or ARMv7 (like the Beagleboard, Chromebook, Cubieboard, etc.) that supports VFPv3 should work. Both hard-float armhf/gnueabihf and soft-float armel/gnueabi builds are provided. The armhf builds for Raspbian are created using the Raspberry Pi custom cross-compilation toolchain based on gcc-arm-linux-gnueabihf and should work on ARMv6 and ARMv7 devices running Debian or Raspbian. The armel builds are built using the gcc-arm-linux-gnuebi toolchain provided by Ubuntu and currently target ARMv7.
Windows 64 work is still stalling, we would welcome a volunteer to handle that.
Highlights
- JIT support for ARM, architecture versions 6 and 7, hard- and soft-float ABI
- Stacklet support for ARM
- Support for os.statvfs and os.fstatvfs on unix systems
- Improved logging performance
- Faster sets for objects
- Interpreter improvements
- During packaging, compile the CFFI based TK extension
- Pickling of numpy arrays and dtypes
- Subarrays for numpy
- Bugfixes to numpy
- Bugfixes to cffi and ctypes
- Bugfixes to the x86 stacklet support
- Fixed issue 1533: fix an RPython-level OverflowError for space.float_w(w_big_long_number).
- Fixed issue 1552: GreenletExit should inherit from BaseException.
- Fixed issue 1537: numpypy __array_interface__
- Fixed issue 1238: Writing to an SSL socket in PyPy sometimes failed with a "bad write retry" message.
Cheers,
David Schneider for the PyPy team.