Monday, February 11, 2019

PyPy v7.0.0: triple release of 2.7, 3.5 and 3.6-alpha


The PyPy team is proud to release the version 7.0.0 of PyPy, which includes three different interpreters:
  • PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.7
  • PyPy3.5, which supports Python 3.5
  • PyPy3.6-alpha: this is the first official release of PyPy to support 3.6 features, although it is still considered alpha quality.
All the interpreters are based on much the same codebase, thus the triple release.
Until we can work with downstream providers to distribute builds with PyPy, we have made packages for some common packages available as wheels.
The GC hooks , which can be used to gain more insights into its performance, has been improved and it is now possible to manually manage the GC by using a combination of gc.disable and gc.collect_step. See the GC blog post.
We updated the cffi module included in PyPy to version 1.12, and the cppyy backend to 1.4. Please use these to wrap your C and C++ code, respectively, for a JIT friendly experience.
As always, this release is 100% compatible with the previous one and fixed several issues and bugs raised by the growing community of PyPy users. We strongly recommend updating.
The PyPy3.6 release and the Windows PyPy3.5 release are still not production quality so your mileage may vary. There are open issues with incomplete compatibility and c-extension support.
The utf8 branch that changes internal representation of unicode to utf8 did not make it into the release, so there is still more goodness coming. You can download the v7.0 releases here:
http://pypy.org/download.html
We would like to thank our donors for the continued support of the PyPy project. If PyPy is not quite good enough for your needs, we are available for direct consulting work.
We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on pypy, or general help with making RPython's JIT even better.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7, 3.5 and 3.6. It's fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython can do for them.
The PyPy release supports:
  • x86 machines on most common operating systems (Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)
  • big- and little-endian variants of PPC64 running Linux,
  • s390x running Linux
Unfortunately at the moment of writing our ARM buildbots are out of service, so for now we are not releasing any binary for the ARM architecture.

What else is new?

PyPy 6.0 was released in April, 2018. There are many incremental improvements to RPython and PyPy, the complete listing is here.

Please update, and continue to help us make PyPy better.


Cheers, The PyPy team

Saturday, February 9, 2019

Düsseldorf Sprint Report 2019

Hello everyone!

We are happy to report a successful and well attended sprint that is wrapping up in Düsseldorf, Germany. In the last week we had eighteen people sprinting at the Heinrich-Heine-Universität Düsseldorf on various topics.

Totally serious work going on here constantly.

A big chunk of the sprint was dedicated to various discussions, since we did not manage to gather the core developers in one room in quite a while. Discussion topics included:

  • Funding and general sustainability of open source.
  • Catching up with CPython 3.7/3.8 – we are planning to release 3.6 some time in the next few months and we will continue working on 3.7/3.8.
  • What to do with VMprof
  • How can we support Cython inside PyPy in a way that will be understood by the JIT, hence fast.
  • The future of supporting the numeric stack on pypy – we have made significant progress in the past few years and most of the numeric stack works out of the box, but deployment and performance remain problems. Improving on those problems remains a very important focus for PyPy as a project.
  • Using the presence of a CPython developer (Łukasz Langa) and a Graal Python developer (Tim Felgentreff) we discussed ways to collaborate in order to improve Python ecosystem across implementations.
  • Pierre-Yves David and Georges Racinet from octobus gave us an exciting demo on Heptapod, which adds mercurial support to gitlab.
  • Maciej and Armin gave demos of their current (non-PyPy-related) project VRSketch.

Visiting the Landschaftspark Duisburg Nord on the break day

Some highlights of the coding tasks worked on:

  • Aarch64 (ARM64) JIT backend work has been started, we are able to run the first test! Tobias Oberstein from Crossbar GmbH and Rodolph Perfetta from ARM joined the sprint to help kickstart the project.
  • The long running math-improvements branch that was started by Stian Andreassen got merged after bugfixes done by Alexander Schremmer. It should improve operations on large integers.
  • The arcane art of necromancy was used to revive long dormant regalloc branch started and nearly finished by Carl Friedrich Bolz-Tereick. The branch got merged and gives some modest speedups across the board.
  • Andrew Lawrence worked on MSI installer for PyPy on windows.
  • Łukasz worked on improving failing tests on the PyPy 3.6 branch. He knows very obscure details of CPython (e.g. how pickling works), hence we managed to progress very quickly.
  • Matti Picus set up a new benchmarking server for PyPy 3 branches.
  • The Utf8 branch, which changes the internal representation of unicode might be finally merged at some point very soon. We discussed and improved upon the last few blockers. It gives significant speedups in a lot of cases handling strings.
  • Zlib was missing couple methods, which were added by Ronan Lamy and Julian Berman.
  • Manuel Jacob fixed RevDB failures.
  • Antonio Cuni and Matti Picus worked on 7.0 release which should happen in a few days.

Now we are all quite exhausted, and are looking forward to catching up on sleep.

Best regards, Maciej Fijałkowski, Carl Friedrich Bolz-Tereick and the whole PyPy team.

Thursday, January 3, 2019

PyPy for low-latency systems

PyPy for low-latency systems

Recently I have merged the gc-disable branch, introducing a couple of features which are useful when you need to respond to certain events with the lowest possible latency. This work has been kindly sponsored by Gambit Research (which, by the way, is a very cool and geeky place where to work, in case you are interested). Note also that this is a very specialized use case, so these features might not be useful for the average PyPy user, unless you have the same problems as described here.

The PyPy VM manages memory using a generational, moving Garbage Collector. Periodically, the GC scans the whole heap to find unreachable objects and frees the corresponding memory. Although at a first look this strategy might sound expensive, in practice the total cost of memory management is far less than e.g. on CPython, which is based on reference counting. While maybe counter-intuitive, the main advantage of a non-refcount strategy is that allocation is very fast (especially compared to malloc-based allocators), and deallocation of objects which die young is basically for free. More information about the PyPy GC is available here.

As we said, the total cost of memory managment is less on PyPy than on CPython, and it's one of the reasons why PyPy is so fast. However, one big disadvantage is that while on CPython the cost of memory management is spread all over the execution of the program, on PyPy it is concentrated into GC runs, causing observable pauses which interrupt the execution of the user program.
To avoid excessively long pauses, the PyPy GC has been using an incremental strategy since 2013. The GC runs as a series of "steps", letting the user program to progress between each step.

The following chart shows the behavior of a real-world, long-running process:


The orange line shows the total memory used by the program, which increases linearly while the program progresses. Every ~5 minutes, the GC kicks in and the memory usage drops from ~5.2GB to ~2.8GB (this ratio is controlled by the PYPY_GC_MAJOR_COLLECT env variable).
The purple line shows aggregated data about the GC timing: the whole collection takes ~1400 individual steps over the course of ~1 minute: each point represent the maximum time a single step took during the past 10 seconds. Most steps take ~10-20 ms, although we see a horrible peak of ~100 ms towards the end. We have not investigated yet what it is caused by, but we suspect it is related to the deallocation of raw objects.

These multi-millesecond pauses are a problem for systems where it is important to respond to certain events with a latency which is both low and consistent. If the GC kicks in at the wrong time, it might causes unacceptable pauses during the collection cycle.

Let's look again at our real-world example. This is a system which continuously monitors an external stream; when a certain event occurs, we want to take an action. The following chart shows the maximum time it takes to complete one of such actions, aggregated every minute:


You can clearly see that the baseline response time is around ~20-30 ms. However, we can also see periodic spikes around ~50-100 ms, with peaks up to ~350-450 ms! After a bit of investigation, we concluded that most (although not all) of the spikes were caused by the GC kicking in at the wrong time.

The work I did in the gc-disable branch aims to fix this problem by introducing two new features to the gc module:
  • gc.disable(), which previously only inhibited the execution of finalizers without actually touching the GC, now disables the GC major collections. After a call to it, you will see the memory usage grow indefinitely.
  • gc.collect_step() is a new function which you can use to manually execute a single incremental GC collection step.
It is worth to specify that gc.disable() disables only the major collections, while minor collections still runs. Moreover, thanks to the JIT's virtuals, many objects with a short and predictable lifetime are not allocated at all. The end result is that most objects with short lifetime are still collected as usual, so the impact of gc.disable() on memory growth is not as bad as it could sound.

Combining these two functions, it is possible to take control of the GC to make sure it runs only when it is acceptable to do so. For an example of usage, you can look at the implementation of a custom GC inside pypytools. The peculiarity is that it also defines a "with nogc():" context manager which you can use to mark performance-critical sections where the GC is not allowed to run.

The following chart compares the behavior of the default PyPy GC and the new custom GC, after a careful placing of nogc() sections:


The yellow line is the same as before, while the purple line shows the new system: almost all spikes have gone, and the baseline performance is about 10% better. There is still one spike towards the end, but after some investigation we concluded that it was not caused by the GC.

Note that this does not mean that the whole program became magically faster: we simply moved the GC pauses in some other place which is not shown in the graph: in this specific use case this technique was useful because it allowed us to shift the GC work in places where pauses are more acceptable.

All in all, a pretty big success, I think. These functionalities are already available in the nightly builds of PyPy, and will be included in the next release: take this as a New Year present :)

Antonio Cuni and the PyPy team