Saturday, November 12, 2016

PyPy2.7 v5.6 released - stdlib 2.7.12 support, C-API improvements, and more

We have released PyPy2.7 v5.6 [0], about two months after PyPy2.7 v5.4. This new PyPy2.7 release includes the upstream stdlib version 2.7.12.

We continue to make incremental improvements to our C-API compatibility layer (cpyext). We pass all but 12 of the over-6000 tests in the upstream NumPy test suite, and have begun examining what it would take to support Pandas and PyQt.

Work proceeds at a good pace on the PyPy3.5 version due to a grant from the Mozilla Foundation, and some of those changes have been backported to PyPy2.7 where relevant.

The PowerPC and s390x backend have been enhanced with the capability to use SIMD instructions for micronumpy loops.

We changed timeit to now report average +/- standard deviation, which is better than the misleading minimum value reported in CPython.

We now support building PyPy with OpenSSL 1.1 in our built-in _ssl module, as well as maintaining support for previous versions.

CFFI has been updated to 1.9, improving an already great package for interfacing with C.

As always, this release fixed many issues and bugs raised by the growing community of PyPy users. We strongly recommend updating. You can download the PyPy2.7 v5.6 release here:

http://pypy.org/download.html

Downstream packagers have been hard at work. The Debian package is already available, and the portable PyPy versions are also ready, for those who wish to run PyPy on other Linux distributions like RHEL/Centos 5.

We would like to thank our donors for the continued support of the PyPy project.

We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on pypy, or general help with making RPython’s JIT even better.

What is PyPy?

x86 machines on most common operating systems (Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)

newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,

big- and little-endian variants of PPC64 running Linux,

s390x running Linux

What else is new?

(since the release of PyPy 5.4 in August, 2016)

There are many incremental improvements to RPython and PyPy, the complete listing is here.

Please update, and continue to help us make PyPy better.

Cheers, The PyPy team

[0] We skipped 5.5 since we share a code base with PyPy3, and PyPy3.3-v.5.5-alpha was released last month

Thursday, November 3, 2016

Vectorization extended. PowerPC and s390x

We are happy to announce that JIT support in both the PowerPC backend and the
s390x backend have been enhanced. Both can now vectorize loops via SIMD
instructions. Special thanks to IBM for funding this work.

If you are not familiar with this topic you can read more details here.

There are many more enhancements under the hood. Most notably, all pure operations are now delayed until the latest possible point. In some cases indices have been calculated more than once or they needed an additional register, because the old value is still used. Additionally it is now possible to load quadword-aligned memory in both PPC and s390x (x86 currently cannot do that).

NumPy & CPyExt

The community and core developers have been moving CPyExt towards a complete, but emulated, layer for CPython C extensions. This is great, because the one restriction preventing the wider deployment of PyPy in several scenarios will hopefully be removed. However, we advocate not to use CPyExt, but rather to not write C code at all (let PyPy speed up your Python code) or use cffi.

The work done here to support vectorization helps micronumpy (NumPyPy) to speed up operations for PPC and s390x. So why is PyPy supporting both NumPyPy and NumPy, do we actually need both? Yes, there are places where gcc can beat the JIT, and places where the tight integration between NumPyPy and PyPy is more performant. We do have plans to integrate both, hijacking the C-extension method calls to use NumPyPy where we know NumPyPy can be faster.

Just to give you an idea why this is a benefit:

NumPy arrays can carry custom dtypes and apply user defined python functions on the arrays. How could one optimize this kind of scenario? In a traditional setup, you cannot. But as soon as NumPyPy is turned on, you can suddenly JIT compile this code and vectorize it.

Another example is element access that occurs frequently, or any other calls that cross between Python and the C level frequently.

Benchmarks

Let's have a look at some benchmarks reusing mikefc's numpy benchmark suite (find the forked version here). I only ran a subset of microbenchmarks, showing that the core functionality is
functioning properly. Additionally it has been rewritten to use perf instead of the timeit stdlib module.

Setup

x86 runs on a Intel i7-2600 clocked at 3.40GHz using 4 cores. PowerPC runs on the Power 8 clocked at 3.425GHz providing 160 cores. Last but not least the mainframe machine clocked up to 4 GHz, but fully virtualized (as it is common for such machines). Note that PowerPC is a non private remote machine. It is used by many users and it is crowded with processes. It is hard to extract a stable benchmark there.

x86 ran on Fedora 24 (kernel version of 4.8.4), PPC ran on Fedora 21 (kernel version 3.17.4) and s390x ran on Redhat Linux 7.2 (kernel version 3.10.0). Respectivley, numpy on cpython had openblas available on x86, no blas implementation were present on s390x and PPC provided blas and lapack.

As you can see all machines run very different configurations. It does not make sense to compare across platforms, but rather implementations on the same platform.

Blue shows CPython 2.7.10+ available on that platform using the latest NumPy (1.11). Micro NumPy is used for PyPy. PyPy+ indicates that the vectorization optimization is turned on.
All bar charts show the median value of all runs (5 samples, 100 loops, 10 inner loops, for the operations on vectors (not matrices) the loops are set to 1000). PyPy additionally gets 3 extra executions to warmup the JIT.

The comparison is really comparing speed of machine code. It compares the PyPy's JIT output vs GCC's output. It has little to do with the speed of the interpreter.

Both new SIMD backends speedup the numeric kernels. Some times it is near to the speed of CPython, some times it is faster. The maximum parallelism very much depends on the extension emitted by the compiler. All three SIMD backends have the same vector register size (which is 128 bit). This means that all three behave similar but ppc and s390x gain more because they can load 128bit of memory from quadword aligned memory.

Future directions

Python is achieving rapid adoption in data science. This is currently a trend emerging in Europe, and Python is already heavily used for data science in the USA many other places around the world.

PyPy can make a valuable contribution for data scientists, helping them to rapidly write scientific programs in Python and run them at near native speed. If you happen to be in that situation, we are eager to hear you feedback or resolve your issues and also work together to improve the performance of your,
code. Just get in touch!

Richard Plangger (plan_rich) and the PyPy team

Wednesday, October 12, 2016

PyPy3 5.5.0 released

We're pleased to announce the release of PyPy3 v5.5.0. Coming four months after PyPy3.3 v5.2, it improves compatibility with Python 3.3 (3.3.5). We strongly recommend updating from previous PyPy3 versions.

We would like to thank all of the people who donated to the py3k proposal for supporting the work that went into this release.

You can download the PyPy3.3 v5.5.0 release here: http://pypy.org/download.html

Improved Python 3.3.5 support.

os.get_terminal_size(), time.monotonic(), str.casefold()
faulthandler module
There are still some missing features such as a PEP 393-like space efficient string representation and including performance regressions (e.g. issue #2305). The focus for this release has been updating to 3.3 compatibility. Windows is also not yet supported.

ensurepip is also included (it's only included in CPython 3 >= 3.4).
Buffer interface improvements (numpy on top of cpyext)
Several JIT improvements (force-virtual-state, residual calls)
Search path for libpypy-c.so has changed (helps with cffi embedding on linux distributions)
Improve the error message when the user forgot the "self" argument of a method
Many more small improvements, please head over to our documentation for more information

Towards Python 3.5

We have started to work on Python 3.5, which is a version used by many software projects. It seems to get wide adoption. We are happy to be part of the Mozilla Open Source Support (MOSS) initiative.

Nevertheless we want to give our users the chance to use PyPy in their Python 3 projects, thus we have prepared this release.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.10 and 3.3.5. It's fast due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

This release supports:

x86 machines on most common operating systems except Windows
newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux
big- and little-endian variants of PPC64 running Linux
s390x running Linux

Please try it out and let us know what you think. We welcome feedback, we know
you are using PyPy, please tell us about it!

Cheers

The PyPy Team

Saturday, September 10, 2016

RevDB released, v5.4.1

Hi all,

The first beta version of RevDB is out! Remember that RevDB is a reverse debugger for Python. The idea is that it is a debugger that can run forward and backward in time, letting you more easily understand your subtle bug in your big Python program.

RevDB should work on almost any Python program. Even if you are normally only using CPython, trying to reproduce the bug with RevDB is similar to trying to run the program on a regular PyPy---usually it just works, even if not quite always.

News from the alpha version in the previous blog post include notably support for:

Threads.
CPyExt, the compatibility layer of PyPy that can run CPython C extension modules.

as well as many other improvements.

You need to build it yourself for now. It is tested on 64-bit Linux. 32-bit Linux, OS/X, and other POSIX platforms should all either work out of the box or be just a few fixes away (contributions welcome). Win32 support is a lot more involved but not impossible.

See https://bitbucket.org/pypy/revdb/ for more information!

Armin

Wednesday, September 7, 2016

PyPy 5.4.1 bugfix released

We have released a bugfix for PyPy2.7-v5.4.0, released last week, due to the following issues:

Update list of contributors in documentation and LICENSE file, this was unfortunately left out of 5.4.0. My apologies to the new contributors
Allow tests run with -A to find libm.so even if it is a script not a dynamically loadable file
Bump sys.setrecursionlimit() when translating PyPy, for translating with CPython
Tweak a float comparison with 0 in backendopt.inline to avoid rounding errors
Fix for an issue for translating the sandbox
Fix for and issue where unicode.decode('utf8', 'custom_replace') messed up the last byte of a unicode string sometimes
Update built-in cffi to version 1.8.1
Explicitly detect that we found as-yet-unsupported OpenSSL 1.1, and crash translation with a message asking for help porting it
Fix a regression where a PyBytesObject was forced (converted to a RPython object) when not required, reported as issue #2395

Thanks to those who reported the issues.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

This release supports:

x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows 32, OpenBSD, FreeBSD),
newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,
big- and little-endian variants of PPC64 running Linux,
s390x running Linux

Please update, and continue to help us make PyPy better.

Cheers

The PyPy Team

Wednesday, August 31, 2016

PyPy2 v5.4 released - incremental improvements and enhancements

We have released PyPy2.7 v5.4, a little under two months after PyPy2.7 v5.3. This new PyPy2.7 release includes incremental improvements to our C-API compatibility layer (cpyext), enabling us to pass over 99% of the upstream numpy test suite.

We updated built-in cffi support to version 1.8, which now supports the “limited API” mode for c-extensions on CPython >=3.2.

We improved tooling for the PyPy JIT, and expanded VMProf support to OpenBSD and Dragon Fly BSD

As always, this release fixed many issues and bugs raised by the growing community of PyPy users. We strongly recommend updating.

You can download the PyPy2 v5.4 release here:

http://pypy.org/download.html

We would like to thank our donors for their continued support of the PyPy project. We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, testing and adapting popular modules to run on PyPy, or general help with making RPython’s JIT even better.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It’s fast (PyPy and CPython 2.7 performance comparison) due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

This release supports:

x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows 32, OpenBSD, FreeBSD)
newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux
big- and little-endian variants of PPC64 running Linux
s390x running Linux

What is New?

(since the release of PyPy 5.3 in June, 2016)

There are many incremental improvements to RPython and PyPy, the complete listing is here. Mozilla generously sponsored work toward python 3.5 compatibility, and we are beginning to see some cross-over improvements of RPython and PyPy2.7 as a result.

Please update, and continue to help us make PyPy better. Cheers

The PyPy Team

Thursday, August 11, 2016

PyPy Tooling Upgrade: JitViewer and VMProf

We are happy to announce a major JitViewer (JV) update.
JV allows you to inspect RPython's internal compiler representation (the language in which PyPy is implemented) including the generated machine code of your program. It can graphically show you details of the JIT compiled code and helps you pinpoint issues in your program.

VMProf is a statistical CPU profiler for python imposing very little overhead at runtime.

Both VMProf and JitViewer share a common goal: Present useful information for your python program.
The combination of both can reveal more information than either alone.
That is the reason why they are now both packaged together.
We also updated vmprof.com with various bug fixes and changes including an all new interface to JV.

This work was done with the goal of improving tooling and libraries around the Python/PyPy/RPython ecosystem.
Some of the tools we have developed:

CFFI - Foreign Function Interface that avoids CPyExt (CFFI docs)
RevDB - A reverse debugger for python (RevDB blog post)

and of course the tools we discuss here:

VMProf - A statistical CPU profiler (VMProf docs)
JitViewer - Visualization of the log file produced by RPython (JitLog docs)

A "brand new" JitViewer

JitViewer has two pieces: you create a log file when running your program, and then use a graphic tool to view what happened.

The old logging format was a hard-to-maintain, plain-text-logging facility. Frequent changes often broke internal tools.
Additionally, the logging output of a long running program required a lot of disk space.

Our new binary format encodes data densely, makes use of some compression (gzip), and tries to remove repetition where possible.
It also supports versioning for future proofing and can be extended easily.

And *drumroll* you no longer need to install a tool to view the log yourself
anymore! The whole system moved to vmprof.com and you can use it any time.

Sounds great. But what can you do with it? Here are two examples for a PyPy user:

PyPy crashed? Did you discover a bug?

For some hard to find bugs it is often necessary to look at the compiled code. The old
procedure often required you to upload a plain text file which was hard to parse and to look through.

A better way to share a crash report is to install the ``vmprof`` module from PyPi and execute either of the two commands:

# this program does not crash, but has some weird behaviour
$ pypy -m jitlog --web <your program args>
...
PyPy Jitlog: http://vmprof.com/#/<hash>/traces
# this program segfaults
$ pypy -m jitlog -o /tmp/log <your program args>
...
<Segfault>
$ pypy -m jitlog --upload /tmp/log
PyPy Jitlog: http://vmprof.com/#/<hash>/traces

Providing the link in the bug report allows PyPy developers to browse and identify potential issues.

Speed issues

VMProf is a great tool to find hot spots that consume a lot of time in your program. As soon as you have identified code that runs slowly, you can switch to jitlog and maybe pinpoint certain aspects that do not behave as expected. You will find an overview, and are able to browse the generated code. If you cannot make sense of all that, you can just share the link with us and we can have a look too.

Future direction

We hope that the new release will help both PyPy developers and PyPy users resolve potential issues and easily point them out.

Here are a few ideas what might come in the next few releases:

Combination of CPU profiles and the JITLOG (sadly did not make it into the current release).
Extend vmprof.com to be able to query vmprof/jitlog.
An example query for vmprof: 'methods.callsites() > 5' and
for the jitlog would be 'traces.contains('call_assembler').hasbridge('*my_func_name*')'.
Extend the jitlog to capture the information of the optimization stage.

Richard Plangger (plan_rich) and the PyPy team

Tuesday, August 9, 2016

PyPy gets funding from Mozilla for Python 3.5 support

"Python 2.x versus Python 3.x": this is by now an old question. In the eyes of some people Python 2 is here to stay, and in the eyes of others Python has long been 3 only.

PyPy's own position is that PyPy will support Python 2.7 forever---the RPython language in which PyPy is written is a subset of 2.7, and we have no plan to upgrade that. But at the same time, we want to support 3.x. This is particularly true now: a relatively recent development is that Python 3.5 seems to attract more and more people. The "switch" to Python 3.x might be starting to happen.

Correspondingly, PyPy has been searching for a while for a way to support a larger-scale development effort. The goal is to support not just any old version of Python 3.x, but Python 3.5, as this seems to be the version that people are switching to. PyPy is close to supporting all of Python 3.3 now; but the list of what is new in Python 3.4 and 3.5 is far, far longer than anyone imagines. The long-term goal is also to get a version of "PyPy3" that is as good as "PyPy2" is, including its performance and its cpyext layer (CPython C API interoperability), for example.

So, the end result: Mozilla recently decided to award $200,000 to Baroque Software to work on PyPy as part of its Mozilla Open Source Support (MOSS) initiative. This money will be used to implement the Python 3.5 features in PyPy. Within the next year, we plan to use the money to pay four core PyPy developers half-time to work on the missing features and on some of the big performance and cpyext issues. This should speed up the progress of catching up with Python 3.x significantly. We are extremely thankful to Mozilla for supporting us in this way, and will keep you updated on the progress via this blog.

Friday, July 8, 2016

Reverse debugging for Python

RevPDB

A "reverse debugger" is a debugger where you can go forward and backward in time. It is an uncommon feature, at least in the open source world, but I have no idea why. I have used undodb-gdb and rr, which are reverse debuggers for C code, and I can only say that they saved me many, many days of poking around blindly in gdb.

The PyPy team is pleased to give you "RevPDB", a reverse-debugger similar to rr but for Python.

An example is worth a thousand words. Let's say your big Python program has a bug that shows up inconsistently. You have nailed it down to something like:

start x.py, which does stuff (maybe involving processing files, answering some web requests that you simulate from another terminal, etc.);
sometimes, after a few minutes, your program's state becomes inconsistent and you get a failing assert or another exception.

This is the case where RevPDB is useful.

RevPDB is available only on 64-bit Linux and OS/X right now, but should not be too hard to port to other OSes. It is very much alpha-level! (It is a debugger full of bugs. Sorry about that.) I believe it is still useful---it helped me in one real use case already.

How to get RevPDB

The following demo was done with an alpha version for 64-bit Linux, compiled for Arch Linux. I won't provide the binary; it should be easy enough to retranslate (much faster than a regular PyPy because it contains neither a JIT nor a custom GC). Grab the PyPy sources from Mercurial, and then:

hg update reverse-debugger
# or "hg update ff376ccacb36" for exactly this demo
cd pypy/goal
../../rpython/bin/rpython -O2 --revdb targetpypystandalone.py  \
                  --withoutmod-cpyext --withoutmod-micronumpy

and possibly rename the final pypy-c to pypy-revdb to avoid confusion.

Other platforms than 64-bit Linux and OS/X need some fixes before they work.

Demo

For this demo, we're going to use this x.py as the "big program":

import os

class Foo(object):
    value = 5

lst1 = [Foo() for i in range(100)]
lst1[50].value += 1
for x in lst1:
    x.value += 1

for x in lst1:
    if x.value != 6:
        print 'oops!'
        os._exit(1)

Of course, it is clear what occurs in this small example: the check fails on item 50. For this demo, the check has been written with os._exit(1), because this exits immediately the program. If it was written with an assert, then its failure would execute things in the traceback module afterwards, to print the traceback; it would be a minor mess just to find the exact point of the failing assert. (This and other issues are supposed to be fixed in the future, but for now it is alpha-level.)

Anyway, with a regular assert and a regular post-mortem pdb, we could observe that x.value is indeed 7 instead of 6 when the assert fails. Imagine that the program is much bigger: how would we find the exact chain of events that caused this value 7 to show up on this particular Foo object? This is what RevPDB is for.

First, we need for now to disable Address Space Layout Randomization (ASLR), otherwise replaying will not work. This is done once with the following command line, which changes the state until the next reboot:

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

UPDATE: the above is no longer necessary from revision ff376ccacb36.

Run x.py with RevPDB's version of PyPy instead of the regular interpreter (CPython or PyPy):

PYPYRDB=log.rdb ./pypy-revdb x.py

This pypy-revdb executable is like a slow PyPy executable, running (for now) without a JIT. This produces a file log.rdb which contains a complete log of this execution. (If the bug we are tracking occurs rarely, we need to re-run it several times until we get the failure. But once we got the failure, then we're done with this step.)

Start:

rpython/translator/revdb/revdb.py log.rdb

We get a pdb-style debugger. This revdb.py is a normal Python program, which you run with an unmodified Python; internally, it looks inside the log for the path to pypy-revdb and run it as needed (as one forking subprocess, in a special mode).

Initially, we are at the start of the program---not at the end, like we'd get in a regular debugger:

File "<builtin>/app_main.py", line 787 in setup_bootstrap_path:
(1)$

The list of commands is available with help.

Go to the end with continue (or c):

(1)$ continue
File "/tmp/x.py", line 14 in <module>:
...
  lst1 = [Foo() for i in range(100)]
  lst1[50].value += 1
  for x in lst1:
      x.value += 1

  for x in lst1:
      if x.value != 6:
          print 'oops!'
>         os._exit(1)
(19727)$

We are now at the beginning of the last executed line. The number 19727 is the "time", measured in number of lines executed. We can go backward with the bstep command (backward step, or bs), line by line, and forward again with the step command. There are also commands bnext, bcontinue and bfinish and their forward equivalents. There is also "go TIME" to jump directly to the specified time. (Right now the debugger only stops at "line start" events, not at function entry or exit, which makes some cases a bit surprising: for example, a step from the return statement of function foo() will jump directly to the caller's caller, if the caller's current line was return foo() + 2, because no "line start" event occurs in the caller after foo() returns to it.)

We can print Python expressions and statements using the p command:

(19727)$ p x
$0 = <__main__.Foo object at 0xfffffffffffeab3e>
(19727)$ p x.value
$1 = 7
(19727)$ p x.value + 1
8

The "$NUM =" prefix is only shown when we print an object that really exists in the debugged program; that's why the last line does not contain it. Once a $NUM has been printed, then we can use it in further expressions---even at a different point time. It becomes an anchor that always refers to the same object:

(19727)$ bstep

File "/tmp/x.py", line 13 in <module>:
...

  lst1 = [Foo() for i in range(100)]
  lst1[50].value += 1
  for x in lst1:
      x.value += 1

  for x in lst1:
      if x.value != 6:
>         print 'oops!'
          os._exit(1)
(19726)$ p $0.value
$1 = 7

In this case, we want to know when this value 7 was put in this attribute. This is the job of a watchpoint:

(19726)$ watch $0.value
Watchpoint 1 added
updating watchpoint value: $0.value => 7

This watchpoint means that $0.value will be evaluated at each line. When the repr() of this expression changes, the watchpoint activates and execution stops:

(19726)$ bcontinue
[searching 19629..19726]
[searching 19338..19629]

updating watchpoint value: $0.value => 6
Reverse-hit watchpoint 1: $0.value
File "/tmp/x.py", line 9 in <module>:
  import os

  class Foo(object):
      value = 5

  lst1 = [Foo() for i in range(100)]
  lst1[50].value += 1
  for x in lst1:
>     x.value += 1

  for x in lst1:
      if x.value != 6:
          print 'oops!'
          os._exit(1)
(19524)$

Note that using the $NUM syntax is essential in watchpoints. You can't say "watch x.value", because the variable x will go out of scope very soon when we move forward or backward in time. In fact the watchpoint expression is always evaluated inside an environment that contains the builtins but not the current locals and globals. But it also contains all the $NUM, which can be used to refer to known objects. It is thus common to watch $0.attribute if $0 is an object, or to watch len($1) if $1 is some list. The watch expression can also be a simple boolean: for example, "watch $2 in $3" where $3 is some dict and $2 is some object that you find now in the dict; you would use this to find out the time when $2 was put inside $3, or removed from it.

Use "info watchpoints" and "delete <watchpointnum>" to manage watchpoints.

There are also regular breakpoints, which you set with "b FUNCNAME". It breaks whenever there is a call to a function that happens to have the given name. (It might be annoying to use for a function like __init__() which has many homonyms. There is no support for breaking on a fully-qualified name or at a given line number for now.)

In our demo, we stop at the line x.value += 1, which is where the value was changed from 6 to 7. Use bcontinue again to stop at the line lst1[50].value += 1, which is where the value was changed from 5 to 6. Now we know how this value attribute ends up being 7.

(19524)$ bcontinue
[searching 19427..19524]
[searching 19136..19427]

updating watchpoint value: $0.value => 5
Reverse-hit watchpoint 1: $0.value
File "/tmp/x.py", line 7 in <module>:
  import os

  class Foo(object):
      value = 5

  lst1 = [Foo() for i in range(100)]
> lst1[50].value += 1
  for x in lst1:
      x.value += 1

  for x in lst1:
      if x.value != 6:
...
(19422)$

Try to use bcontinue yet another time. It will stop now just before $0 is created. At that point in time, $0 refers to an object that does not exist yet, so the watchpoint now evaluates to an error message (but it continues to work as before, with that error message as the string it currently evaluates to).

(19422)$ bcontinue
[searching 19325..19422]

updating watchpoint value: $0.value => RuntimeError:
               '$0' refers to an object created later in time
Reverse-hit watchpoint 1: $0.value
File "/tmp/x.py", line 6 in <module>:
  import os

  class Foo(object):
      value = 5

> lst1 = [Foo() for i in range(100)]
  lst1[50].value += 1
  for x in lst1:
      x.value += 1

  for x in lst1:
...
(19371)$

In big programs, the workflow is similar, just more complex. Usually it works this way: we find interesting points in time with some combination of watchpoints and some direct commands to move around. We write down on a piece of (real or virtual) paper these points in history, including most importantly their time, so that we can construct an ordered understanding of what is going on.

The current revdb can be annoying and sometimes even crash; but the history you reconstruct can be kept. All the times and expressions printed are still valid when you restart revdb. The only thing "lost" is the $NUM objects, which you need to print again. (Maybe instead of $0, $1, ... we should use $<big number>, where the big number identifies uniquely the object by its creation time. These numbers would continue to be valid even after revdb is restarted. They are more annoying to use than just $0 though.)

Screencast: Here's a (slightly typo-y) screencast of cfbolz using the reverse debugger:

Current issues

General issues:

If you are using revdb on a log that took more than a few minutes to record, then it can be painfully slow. This is because revdb needs to replay again big parts of the log for some operations.
The pypy-revdb is currently missing the following modules:
- thread (implementing multithreading is possible, but not done yet);
- cpyext (the CPython C API compatibility layer);
- micronumpy (minor issue only);
- _continuation (for greenlets).
Does not contain a JIT, and does not use our fast garbage collectors. You can expect pypy-revdb to be maybe 3 times slower than CPython.
Only works on Linux and OS/X. There is no fundamental reason for this restriction, but it is some work to fix.
Replaying a program uses a lot more memory; maybe 15x as much than during the recording. This is because it creates many forks. If you have a program that consumes 10% of your RAM or more, you will need to reduce MAX_SUBPROCESSES in process.py.

Replaying also comes with a bunch of user interface issues:

Attempted to do I/O or access raw memory: we get this whenever trying to print some expression that cannot be evaluated with only the GC memory---or which can, but then the __repr__() method of the result cannot. We need to reset the state with bstep + step before we can print anything else. However, if only the __repr__() crashes, you still see the $NUM = prefix, and you can use that $NUM afterwards.
id() is globally unique, returning a reproducible 64-bit number, so sometimes using id(x) is a workaround for when using x doesn't work because of Attempted to do I/O issues (e.g. p [id(x) for x in somelist]).
as explained in the demo, next/bnext/finish/bfinish might jump around a bit non-predictably.
similarly, breaks on watchpoints can stop at apparently unexpected places (when going backward, try to do "step" once). The issue is that it can only stop at the beginning of every line. In the extreme example, if a line is foo(somelist.pop(getindex())), then somelist is modified in the middle. Immediately before this modification occurs, we are in getindex(), and immediately afterwards we are in foo(). The watchpoint will stop the program at the end of getindex() if running backward, and at the start of foo() if running forward, but never actually on the line doing the change.
watchpoint expressions must not have any side-effect at all. If they do, the replaying will get out of sync and revdb.py will complain about that. Regular p expressions and statements can have side-effects; these effects are discarded as soon as you move in time again.
sometimes even "p import foo" will fail with Attempted to do I/O. Use instead "p import sys; foo = sys.modules['foo']".
use help to see all commands. backtrace can be useful. There is no up command; you have to move in time instead, e.g. using bfinish to go back to the point where the current function was called.

How RevPDB is done

If I had to pick the main advantage of PyPy over CPython, it is that we have got with the RPython translation toolchain a real place for experimentation. Every now and then, we build inside RPython some feature that gives us an optionally tweaked version of the PyPy interpreter---tweaked in a way that would be hard to do with CPython, because it would require systematic changes everywhere. The most obvious and successful examples are the GC and the JIT. But there have been many other experiments along the same lines, from the so-called stackless transformation in the early days, to the STM version of PyPy.

RevPDB works in a similar way. It is a version of PyPy in which some operations are systematically replaced with other operations.

To keep the log file at a reasonable size, we duplicate the content of all GC objects during replaying---by repeating the same actions on them, without writing anything in the log file. So that means that in the pypy-revdb binary, the operations that do arithmetic or read/write GC-managed memory are not modified. Most operations are like that. However, the other operations, the ones that involve either non-GC memory or calls to external C functions, are tweaked. Each of these operations is replaced with code that works in two modes, based on a global flag:

in "recording" mode, we log the result of the operation (but not the arguments);
in "replaying" mode, we don't really do the operation at all, but instead just fetch the result from the log.

Hopefully, all remaining unmodified operations (arithmetic and GC load/store) are completely deterministic. So during replaying, every integer or non-GC pointer variable will have exactly the same value as it had during recording. Interestingly, it means that if the recording process had a big array in non-GC memory, then in the replaying process, the array is not allocated at all; it is just represented by the same address, but there is nothing there. When we record "read item 123 from the array", we record the result of the read (but not the "123"). When we replay, we're seeing again the same "read item 123 from the array" operation. At that point, we don't read anything; we just return the result from the log. Similarly, when recording a "write" to the array, we record nothing (this write operation has no result); so that when replaying, we redo nothing.

Note how that differs from anything managed by GC memory: GC objects (including GC arrays) are really allocated, writes really occur, and reads are redone. We don't touch the log in this case.

Other reverse debuggers for Python

There are already some Python experiments about reverse debugging. This is also known as "omniscient debugging". However, I claim that the result they get to is not very useful (for the purpose presented here). How they work is typically by recording changes to some objects, like lists and dictionaries, in addition to recording the history of where your program passed through. However, the problem of Python is that lists and dictionaries are not the end of the story. There are many, many, many types of objects written in C which are mutable---in fact, the immutable ones are the exception. You can try to systematically record all changes, but it is a huge task and easy to forget a detail.

In other words it is a typical use case for tweaking the RPython translation toolchain, rather than tweaking the CPython (or PyPy) interpreter directly. The result that we get here with RevPDB is more similar to rr anyway, in that only a relatively small number of external events are recorded---not every single change to every single list and dictionary.

Some links:

epdb: https://github.com/native-human/epdb
pode: https://github.com/rodsenra/pode

For C:

rr: http://rr-project.org/
undodb-gdb: http://undo.io/

Future work

As mentioned above, it is alpha-level, and only works on Linux and OS/X. So the plans for the immediate future are to fix the various issues described above, and port to more operating systems. The core of the system is in the C file and headers in rpython/translator/revdb/src-revdb.

For interested people, there is also the Duhton interpreter and its reverse-debugger branch, which is where I prototyped the RPython concept before moving to PyPy. The basics should work for any interpreter written in RPython, but they require some specific code to interface with the language; in the case of PyPy, it is in pypy/interpreter/reverse_debugging.py.

In parallel, there are various user interface improvements that people could be interested in, like a more "pdb++" experience. (And the script at rpython/translator/revdb/revdb.py should be moved out into some more "official" place, and the reverse-debugger branch should be merged back to default.)

I would certainly welcome any help!

-+- Armin

Wednesday, June 8, 2016

PyPy2 v5.3 released - major C-extension support improvements

We have released PyPy2.7 v5.3, about six weeks after PyPy 5.1 and a week after PyPy3.3 v5.2 alpha 1, the first PyPy release targeting 3.3 compatibility. This new PyPy2.7 release includes major improvements for the C-API compatibility layer. In addition to complete support for lxml, we now pass most (more than 95%) of the upstream numpy test suite. We can build and run scipy and matplotlib as well. Most of the failures have to do with (ab) use of the C-API, for instance writing to a read-only pointer obtained from PyString_AsString().

Note that the C-API compatibility layer is significantly slower than CPython, as explained in the blog post about the new strategy for reflection of C objects into the PyPy interpreter.

We updated cffi to version 1.7 (incremental changes which provide a nicer developer experience, documented here). We would encourage developers to move their C-extension modules to cffi, but are willing to help you work through issues with existing code; come to #pypy on IRC and let us know how we can help you help us do better.

You can download the PyPy2 v5.3 release here:

http://pypy.org/download.html

We would like to thank our donors for their continued support of the PyPy project. We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on PyPy, or general help with making RPython’s JIT even better.

What is PyPy?

x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows 32, OpenBSD, FreeBSD)
newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux
big- and little-endian variants of PPC64 running Linux
s390x running Linux

Other Highlights

(since the release of PyPy 5.1 in April, 2016)

New features:
- Merge a major expansion of the C-API support in cpyext, also expand cpyext tests to allow running them after translation as well as untranslated
- Instead of “GIL not held when a CPython C extension module calls PyXxx”, we now silently acquire/release the GIL. Helps with C extension modules that call some PyXxx() functions without holding the GIL (arguably, they are theoretically buggy).
- Support command line -v to trace import statements
- Revive traceviewer, a tool to use pygame to view traces
Numpy via our internal _numpypy module:
- Implement ufunc.outer
- Move PyPy-specific numpypy headers to a subdirectory (also changed the repo accordingly)
Performance improvements:
- Use bitstrings to compress lists of descriptors that are attached to an EffectInfo
- Remove most of the _ovf, _zer and _val operations from RPython. Kills quite some code internally, and allows the JIT to do better optimizations: for example, app-level code like x / 2 or x % 2 can now be turned into x >> 1 or x & 1, even if x is possibly negative.
- Rework the way registers are moved/spilled in before_call()
Internal refactorings:
- Refactor code to better support Python3-compatible syntax
- Reduce the size of generated C sources during translation by eliminating many many unused struct declarations (Issue #2281)
- Reduce the size of generated code by using the same function objects in all generated subclasses
- Share cpyext Py* function wrappers according to the signature, shrinking the translated libpypy.so by about 10% (without the JIT)

Please update, and continue to help us make PyPy better. Cheers
The PyPy Team

Monday, May 30, 2016

PyPy3.3 v5.2 alpha 1 released

We're pleased to announce the first alpha release of PyPy3.3 v5.2. This is the
first release of PyPy which targets Python 3.3 (3.3.5) compatibility.

We would like to thank all of the people who donated to the py3k proposal
for supporting the work that went into this and future releases.

You can download the PyPy3.3 v5.2 alpha 1 release here:

http://pypy.org/download.html#python-3-3-5-compatible-pypy3-3-v5-2

Highlights

Python 3.3.5 support!
- Being an early alpha release, there are some missing features such as a
  PEP 393-like space efficient string representation and known issues
  including performance issues (e.g. issue #2305). The focus for this
  release has been updating to 3.3 compatibility. Windows is also not yet
  supported.
ensurepip is also included (it's only included in CPython 3 >= 3.4).

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7.10 and one day 3.3.5. It's fast due to its integrated tracing JIT
compiler.

We also welcome developers of other dynamic languages to see what RPython
can do for them.

This release supports:

x86 machines on most common operating systems except Windows
(Linux 32/64, Mac OS X 64, OpenBSD, FreeBSD),

newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,

big- and little-endian variants of PPC64 running Linux,

s390x running Linux

Please try it out and let us know what you think. We welcome feedback, we know
you are using PyPy, please tell us about it!

We'd especially like to thank these people for their contributions to this
release:

Manuel Jacob, Ronan Lamy, Mark Young, Amaury Forgeot d'Arc, Philip Jenvey,
Martin Matusiak, Vasily Kuznetsov, Matti Picus, Armin Rigo and many others.

Cheers

The PyPy Team

Tuesday, May 3, 2016

PyPy 5.1.1 bugfix released

We have released a bugfix for PyPy 5.1, due to a regression in installing third-party packages depending on numpy (using our numpy fork available at https://bitbucket.org/pypy/numpy ).

Thanks to those who reported the issue. We also fixed a regression in translating PyPy which increased the memory required to translate. Improvement will be noticed by downstream packagers and those who translate rather than
download pre-built binaries.

What is PyPy?

x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows 32, OpenBSD, FreeBSD),
newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,
big- and little-endian variants of PPC64 running Linux,
s390x running Linux

Please update, and continue to help us make PyPy better.

Cheers

The PyPy Team

Wednesday, April 20, 2016

PyPy 5.1 released

We have released PyPy 5.1, about a month after PyPy 5.0.

This release includes more improvement to warmup time and memory requirements, extending the work done on PyPy 5.0. We have seen an additional reduction of about 20% in memory requirements, and up to 30% warmup time improvement, more detail in the blog post.

We also now have full support for the IBM s390x. Since this support is in RPython, any dynamic language written using RPython, like PyPy, will automagically be supported on that architecture.

We updated cffi to 1.6 (cffi 1.6 itself will be released shortly), and continue to improve support for the wider python ecosystem using the PyPy interpreter.

You can download the PyPy 5.1 release here:

http://pypy.org/download.html

We would like to thank our donors for the continued support of the PyPy project.
We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on pypy, or general help with making RPython’s JIT even better.

What is PyPy?

x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows 32, OpenBSD, FreeBSD),
newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,
big- and little-endian variants of PPC64 running Linux,
s390x running Linux

Other Highlights

(since the release of PyPy 5.0 in March, 2016

New features:
- A new jit backend for the IBM s390x, which was a large effort over the past few months.
- Add better support for PyUnicodeObject in the C-API compatibility layer
- Support GNU/kFreeBSD Debian ports in vmprof
- Add __pypy__._promote
- Make attrgetter a single type for CPython compatibility
Bug Fixes
- Catch exceptions raised in an exit function
- Fix a corner case in the JIT
- Fix edge cases in the cpyext refcounting-compatible semantics (more work on cpyext compatibility is coming in the cpyext-ext branch, but isn’t ready yet)
- Try harder to not emit NEON instructions on ARM processors without NEON support
- Improve the rpython posix module system interaction function calls
- Detect a missing class function implementation instead of calling a random function
- Check that PyTupleObjects do not contain any NULLs at the point of conversion to W_TupleObjects
- In ctypes, fix _anonymous_ fields of instances
- Fix JIT issue with unpack() on a Trace which contains half-written operations
- Fix sandbox startup (a regression in 5.0)
- Fix possible segfault for classes with mangled mro or __metaclass__
- Fix isinstance(deque(), Hashable) on the pure python deque
- Fix an issue with forkpty()
- Issues reported with our previous release were resolved after reports from users on our issue tracker at https://bitbucket.org/pypy/pypy/issues or on IRC at #pypy
Numpy:
- Implemented numpy.where for a single argument
- Indexing by a numpy scalar now returns a scalar
- Fix transpose(arg) when arg is a sequence
- Refactor include file handling, now all numpy ndarray, ufunc, and umath functions exported from libpypy.so are declared in pypy_numpy.h, which is included only when building our fork of numpy
- Add broadcast
Performance improvements:
- Improve str.endswith([tuple]) and str.startswith([tuple]) to allow JITting
- Merge another round of improvements to the warmup performance
- Cleanup history rewriting in pyjitpl
- Remove the forced minor collection that occurs when rewriting the assembler at the start of the JIT backend
- Port the resource module to cffi
Internal refactorings:
- Use a simpler logger to speed up translation
- Drop vestiges of Python 2.5 support in testing
- Update rpython functions with ones needed for py3k

Please update, and continue to help us make PyPy better.
Cheers
The PyPy Team

Monday, April 18, 2016

PyPy Enterprise Edition

With the latest additions, PyPy's JIT now supports the Z architecture on Linux. The newest architecture revision (also known as s390x, or colloquially referred to as "big iron") is the 64-bit extension for IBM mainframes. Currently only Linux 64 bit is supported (not z/OS nor TPF).
This is the fourth assembler backend supported by PyPy in addition to x86 (32 and 64), ARM (32-bit only) and PPC64 (both little- and big-endian). It might seem that we kind of get a hang of new architectures. Thanks to IBM for funding this work!

History

When I went to university one lecture covered the prediction of Thomas Watson in 1943. His famous quote "I think there is a world market for maybe five computers ...", turned out not to be true.

However, even 70 years later, mainframes are used more often than you think. They back critical tasks requiring a high level of stability/security and offer high hardware and computational utilization rates by virtualization.

With the new PyPy JIT backend we are happy to present a fast Python virtual machine for mainframes and contribute more free software running on s390x.

Meta tracing

Even though the JIT backend has been tested on PyPy, it is not restricted to the Python programming language. Do you have a great idea for a DSL, or another language that should run on mainframes? Go ahead and just implement your interpreter using RPython.

How do I get a copy?

PyPy can be built using the usual instructions found here. As soon as the next PyPy version has been released we will provide binaries. Until then you can just grab a nightly here.We are currently busy to get the next version of PyPy ready, so an official release will be rolled out soon.

Comparing s390x to x86

The goal of this comparison is not to scientifically evaluate the benefits/disadvantages on s390x, but rather to see that PyPy's architecture delivers the same benefits as it does on other platforms. Similar to the comparison done for PPC I ran the benchmarks using the same setup. The first column is the speedup of the PyPy JIT VM compared to the speedup of a pure PyPy interpreter 1). Note that the s390x's OS was virtualized.

Label               x86     s390x      s390x (run 2)

ai                 13.7      12.4       11.9
bm_chameleon        8.5       6.3        6.8
bm_dulwich_log      5.1       5.0        5.1
bm_krakatau         5.5       2.0        2.0
bm_mako             8.4       5.8        5.9
bm_mdp              2.0       3.8        3.8
chaos              56.9      52.6       53.4
crypto_pyaes       62.5      64.2       64.2
deltablue           3.3       3.9        3.6
django             28.8      22.6       21.7
eparse              2.3       2.5        2.6
fannkuch            9.1       9.9       10.1
float              13.8      12.8       13.8
genshi_text        16.4      10.5       10.9
genshi_xml          8.2       7.9        8.2
go                  6.7       6.2       11.2
hexiom2            24.3      23.8       23.5
html5lib            5.4       5.8        5.7
json_bench         28.8      27.8       28.1
meteor-contest      5.1       4.2        4.4
nbody_modified     20.6      19.3       19.4
pidigits            1.0      -1.1       -1.0
pyflate-fast        9.0       8.7        8.5
pypy_interp         3.3     4.2        4.4
raytrace-simple    69.0     100.9       93.4
richards           94.1      96.6       84.3
rietveld            3.2       2.5        2.7
slowspitfire        2.8       3.3        4.2
spambayes           5.0       4.8        4.8
spectral-norm      41.9      39.8       42.6
spitfire            3.8       3.9        4.3
spitfire_cstringio 7.6       7.9        8.2
sympy_expand        2.9       1.8        1.8
sympy_integrate     4.3       3.9        4.0
sympy_str           1.5       1.3        1.3
sympy_sum           6.2       5.8        5.9
telco              61.2      48.5       54.8
twisted_iteration 55.5      41.9       43.8
twisted_names       8.2       9.3        9.7
twisted_pb         12.1      10.4       10.2
twisted_tcp         4.9       4.8        5.2

Geometric mean:    9.31      9.10       9.43

As you can see the benefits are comparable on both platforms.
Of course this is scientifically not good enough, but it shows a tendency. s390x can achieve the same results as you can get on x86.

Are you running your business application on a mainframe? We would love to get some feedback. Join us in IRC tell us if PyPy made your application faster!

plan_rich & the PyPy Team

1) PyPy revision for the benchmarks: 4b386bcfee54

Thursday, April 7, 2016

Warmup improvements: more efficient trace representation

Hello everyone.

I'm pleased to inform that we've finished another round of improvements to the warmup performance of PyPy. Before I go into details, I'll recap the achievements that we've done since we've started working on the warmup performance. I picked a random PyPy from November 2014 (which is definitely before we started the warmup work) and compared it with a recent one, after 5.0. The exact revisions are respectively ffce4c795283 and cfbb442ae368. First let's compare pure warmup benchmarks that can be found in our benchmarking suite. Out of those, pypy-graph-alloc-removal numbers should be taken with a grain of salt, since other work could have influenced the results. The rest of the benchmarks mentioned is bottlenecked purely by warmup times.

You can see how much your program spends in warmup running PYPYLOG=jit-summary:- pypy your-program.py under "tracing" and "backend" fields (in the first three lines). An example looks like that:

[e00c145a41] {jit-summary
Tracing:        71      0.053645 <- time spent tracing & optimizing
Backend:        71      0.028659 <- time spent compiling to assembler
TOTAL:                  0.252217 <- total run time of the program

The results of the benchmarks

benchmark	time - old	time - new	speedup	JIT time - old	JIT time - new
function_call	1.86	1.42	1.3x	1.12s	0.57s
function_call2	5.17s	2.73s	1.9x	4.2s	1.6s
bridges	2.77s	2.07s	1.3x	1.5s	0.8s
pypy-graph-alloc-removal	2.06s	1.65s	1.25x	1.25s	0.79s

As we can see, the overall warmup benchmarks got up to 90% faster with JIT time dropping by up to 2.5x. We have more optimizations in the pipeline, with an idea how to transfer some of the JIT gains into more of a total program runtime by jitting earlier and more eagerly.

Details of the last round of optimizations

Now the nitty gritty details - what did we actually do? I covered a lot of warmup improvements in the past blog posts so I'm going to focus on the last change, the jit-leaner-frontend branch. This last change is simple, instead of using pointers to store the "operations" objects created during tracing, we use a compact list of 16-bit integers (with 16bit pointers in between). On 64bit machine the memory wins are tremendous - the new representation is 4x more efficient to use 16bit pointers than full 64bit pointers. Additionally, the smaller representation has much better cache behavior and much less pointer chasing in memory. It also has a better defined lifespan, so we don't need to bother tracking them by the GC, which also saves quite a bit of time.

The change sounds simple, but the details in the underlaying data mean that everything in the JIT had to be changed which took quite a bit of effort :-)

Going into the future on the JIT front, we have an exciting set of optimizations, ranging from faster loops through faster warmup to using better code generation techniques and broadening the kind of program that PyPy speeds up. Stay tuned for the updates.

We would like to thank our commercial partners for making all of this possible. The work has been performed by baroquesoftware and would not be possible without support from people using PyPy in production. If your company uses PyPy and want it to do more or does not use PyPy but has performance problems with the Python installation, feel free to get in touch with me, trust me using PyPy ends up being a lot cheaper than rewriting everything in go :-)

Best regards,
Maciej Fijalkowski

Saturday, March 19, 2016

PyPy 5.0.1 bugfix released

PyPy 5.0.1

We have released a bugfix for PyPy 5.0, after reports that the newly released lxml 3.6.0, which now supports PyPy 5.0 +, can crash on large files. Thanks to those who reported the crash. Please update, downloads are available at

pypy.org/download.html

The changes between PyPy 5.0 and 5.0.1 are only two bug fixes: one in cpyext, which fixes notably (but not only) lxml; and another for a corner case of the JIT.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It’s fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython can do for them.
This release supports x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows 32, OpenBSD, FreeBSD), newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux, and the big- and little-endian variants of PPC64 running Linux.

Please update, and continue to help us make PyPy better.

Cheers
The PyPy Team

Thursday, March 10, 2016

PyPy 5.0 released

PyPy 5.0

We have released PyPy 5.0, about three months after PyPy 4.0.1. We encourage all users of PyPy to update to this version.

You can download the PyPy 5.0 release here:

http://pypy.org/download.html

Faster and Leaner

We continue to improve the warmup time and memory usage of JIT-related metadata. The exact effects depend vastly on the program you’re running and can range from insignificant to warmup being up to 30% faster and memory dropping by about 30%.

C-API Upgrade

We also merged a major upgrade to our C-API layer (cpyext), simplifying the interaction between c-level objects and PyPy interpreter level objects. As a result, lxml (prerelease) with its cython compiled component passes all tests on PyPy. The new cpyext is also much faster. This major refactoring will soon be followed by an expansion of our C-API compatibility.

Profiling with vmprof supported on more platforms

vmprof has been a go-to profiler for PyPy on linux for a few releases and we’re happy to announce that thanks to the cooperation with jetbrains, vmprof now works on Linux, OS X and Windows on both PyPy and CPython.

CFFI

While not applicable only to PyPy, cffi is arguably our most significant contribution to the python ecosystem. PyPy 5.0 ships with cffi-1.5.2 which now allows embedding PyPy (or CPython) in a C program.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It’s fast (pypy and cpython 2.7.x performance comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython can do for them.
This release supports x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows 32, OpenBSD, freebsd), newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux, and 64 bit PowerPC hardware, specifically Linux running the big- and little-endian variants of ppc64.

Other Highlights (since 4.0.1 released in November 2015)

New features:
- Support embedding PyPy in a C-program via cffi and static callbacks in cffi.
  This deprecates the old method of embedding PyPy
- Refactor vmprof to work cross-operating-system, deprecate using buggy
  libunwind on Linux platforms. Vmprof even works on Windows now.
- Support more of the C-API type slots, like tp_getattro, and fix C-API
  macros, functions, and structs such as _PyLong_FromByteArray(),
  PyString_GET_SIZE, f_locals in PyFrameObject, Py_NAN, co_filename in
  PyCodeObject
- Use a more stable approach for allocating PyObjects in cpyext. (see
  blog post). Once the PyObject corresponding to a PyPy object is created,
  it stays around at the same location until the death of the PyPy object.
  Done with a little bit of custom GC support. It allows us to kill the
  notion of “borrowing” inside cpyext, reduces 4 dictionaries down to 1, and
  significantly simplifies the whole approach (which is why it is a new
  feature while technically a refactoring) and allows PyPy to support the
  populart lxml module (as of the next release) with no PyPy specific
  patches needed
- Make the default filesystem encoding ASCII, like CPython
- Use hypothesis in test creation, which is great for randomizing tests
Bug Fixes
- Backport always using os.urandom for uuid4 from cpython and fix the JIT as well
  (issue #2202)
- More completely support datetime, optimize timedelta creation
- Fix for issue #2185 which caused an inconsistent list of operations to be
  generated by the unroller, appeared in a complicated DJango app
- Fix an elusive issue with stacklets on shadowstack which showed up when
  forgetting stacklets without resuming them
- Fix entrypoint() which now acquires the GIL
- Fix direct_ffi_call() so failure does not bail out before setting CALL_MAY_FORCE
- Fix (de)pickling long values by simplifying the implementation
- Fix RPython rthread so that objects stored as threadlocal do not force minor
  GC collection and are kept alive automatically. This improves perfomance of
  short-running Python callbacks and prevents resetting such object between
  calls
- Support floats as parameters to itertools.isslice()
- Check for the existence of CODESET, ignoring it should have prevented PyPy
  from working on FreeBSD
- Fix for corner case (likely shown by Krakatau) for consecutive guards with
  interdependencies
- Fix applevel bare class method comparisons which should fix pretty printing
  in IPython
- Issues reported with our previous release were resolved after reports from users on our issue tracker at https://bitbucket.org/pypy/pypy/issues or on IRC at #pypy
Numpy:
- Updates to numpy 1.10.2 (incompatibilities and not-implemented features
  still exist)
- Support dtype=((‘O’, spec)) union while disallowing record arrays with
  mixed object, non-object values
- Remove all traces of micronumpy from cpyext if –withoutmod-micronumpy option used
- Support indexing filtering with a boolean ndarray
- Support partition() as an app-level function, together with a cffi wrapper
  in pypy/numpy, this now provides partial support for partition()
Performance improvements:
- Optimize global lookups
- Improve the memory signature of numbering instances in the JIT. This should
  massively decrease the amount of memory consumed by the JIT, which is
  significant for most programs. Also compress the numberings using variable-
  size encoding
- Optimize string concatenation
- Use INT_LSHIFT instead of INT_MUL when possible
- Improve struct.unpack by casting directly from the underlying buffer.
  Unpacking floats and doubles is about 15 times faster, and integer types
  about 50% faster (on 64 bit integers). This was then subsequently
  improved further in optimizeopt.py.
- Optimize two-tuple lookups in mapdict, which improves warmup of instance
  variable access somewhat
- Reduce all guards from int_floordiv_ovf if one of the arguments is constant
- Identify permutations of attributes at instance creation, reducing the
  number of bridges created
- Greatly improve re.sub() performance
Internal refactorings:
- Refactor and improve exception analysis in the annotator
- Remove unnecessary special handling of space.wrap().
- Support list-resizing setslice operations in RPython
- Tweak the trace-too-long heuristic for multiple jit drivers
- Refactor bookkeeping (such a cool word - three double letters) in the
  annotater
- Refactor wrappers for OS functions from rtyper to rlib and simplify them
- Simplify backend loading instructions to only use four variants
- Simplify GIL handling in non-jitted code
- Refactor naming in optimizeopt
- Change GraphAnalyzer to use a more precise way to recognize external
  functions and fix null pointer handling, generally clean up external
  function handling
- Remove pure variants of getfield_gc_* operations from the JIT by
  determining purity while tracing
- Refactor databasing
- Simplify bootstrapping in cpyext
- Refactor rtyper debug code into python.rtyper.debug
- Seperate structmember.h from Python.h Also enhance creating api functions
  to specify which header file they appear in (previously only pypy_decl.h)
- Fix tokenizer to enforce universal newlines, needed for Python 3 support

Please try it out and let us know what you think. We welcome feedback, we know you are using PyPy, please tell us about it!
Cheers
The PyPy Team

Saturday, November 12, 2016

What is PyPy?

What else is new?

Thursday, November 3, 2016

NumPy & CPyExt

Benchmarks

Setup

Future directions

Wednesday, October 12, 2016

Towards Python 3.5

What is PyPy?

Saturday, September 10, 2016

Wednesday, September 7, 2016

What is PyPy?

Wednesday, August 31, 2016

What is PyPy?

What is New?

(since the release of PyPy 5.3 in June, 2016)

Thursday, August 11, 2016

A "brand new" JitViewer

PyPy crashed? Did you discover a bug?

Speed issues

Future direction

Tuesday, August 9, 2016

Friday, July 8, 2016

RevPDB

How to get RevPDB

Demo

Current issues

How RevPDB is done

Other reverse debuggers for Python

Future work

Wednesday, June 8, 2016

What is PyPy?

Other Highlights

(since the release of PyPy 5.1 in April, 2016)

Monday, May 30, 2016

Highlights

What is PyPy?

Tuesday, May 3, 2016

What is PyPy?

Wednesday, April 20, 2016

What is PyPy?

Other Highlights

(since the release of PyPy 5.0 in March, 2016

New features:

Bug Fixes

Numpy:

Performance improvements:

Internal refactorings:

Monday, April 18, 2016

History

Meta tracing

How do I get a copy?

Comparing s390x to x86

Thursday, April 7, 2016

Details of the last round of optimizations

Saturday, March 19, 2016

PyPy 5.0.1

What is PyPy?

Thursday, March 10, 2016

PyPy 5.0

Faster and Leaner

C-API Upgrade

Profiling with vmprof supported on more platforms

CFFI

What is PyPy?

Other Highlights (since 4.0.1 released in November 2015)