Showing posts with label pypy. Show all posts
Showing posts with label pypy. Show all posts

Wednesday, August 12, 2020

A new chapter for PyPy

PyPy winds down its membership in the Software Freedom Conservancy

Conservancy and PyPy's great work together

PyPy joined Conservancy in the second half of 2010, shortly after the release of PyPy 1.2, the first version to contain a fully functional JIT. In 2013, PyPy started supporting ARM, bringing its just-in-time speediness to many more devices and began working toward supporting NumPy to help scientists crunch their numbers faster. Together, PyPy and Conservancy ran successful fundraising drives and facilitated payment and oversight for contractors and code sprints.

Conservancy supported PyPy's impressive growth as it expanded support for different hardware platforms, greatly improved the performance of C extensions, and added support for Python 3 as the language itself evolved.

The road ahead

Conservancy provides a fiscal and organizational home for projects that find the freedoms and guardrails that come along with a charitable home advantageous for their community goals. While this framework was a great fit for the early PyPy community, times change and all good things must come to an end.

PyPy will remain a free and open source project, but the community's structure and organizational underpinnings will be changing and the PyPy community will be exploring options outside of the charitable realm for its next phase of growth ("charitable" in the legal sense -- PyPy will remain a community project).

During the last year PyPy and Conservancy have worked together to properly utilise the generous donations made by stalwart PyPy enthusiats over the years and to wrap up PyPy's remaining charitable obligations. PyPy is grateful for the Conservancy's help in shepherding the project toward its next chapter.

Thank yous

From Conservancy:

"We are happy that Conservancy was able to help PyPy bring important software for the public good during a critical time in its history. We wish the community well and look forward to seeing it develop and succeed in new ways."
— Karen Sandler, Conservancy's Executive Director

From PyPy:

"PyPy would like to thank Conservancy for their decade long support in building the community and wishes Conservancy continued success in their journey promoting, improving, developing and defending free and open source sofware."

— Simon Cross & Carl Friedrich Bolz-Tereick, on behalf of PyPy.

About

PyPy is a multi-layer python interpreter with a built-in JIT compiler that runs Python quickly across different computing environments. Software Freedom Conservancy (Conservancy) is a charity that provides a home to over forty free and open source software projects.

Friday, April 10, 2020

PyPy 7.3.1 released

The PyPy team is proud to release the version 7.3.1 of PyPy, which includes two different interpreters:
  • PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.7 including the stdlib for CPython 2.7.13
  • PyPy3.6: which is an interpreter supporting the syntax and the features of Python 3.6, including the stdlib for CPython 3.6.9.
The interpreters are based on much the same codebase, thus the multiple release. This is a micro release, no APIs have changed since the 7.3.0 release in December, but read on to find out what is new.

Conda Forge now supports PyPy as a Python interpreter. The support right now is being built out. After this release, many more c-extension-based packages can be successfully built and uploaded. This is the result of a lot of hard work and good will on the part of the Conda Forge team. A big shout out to them for taking this on.

We have worked with the Python packaging group to support tooling around building third party packages for Python, so this release updates the pip and setuptools installed when executing pypy -mensurepip to pip>=20. This completes the work done to update the PEP 425 python tag from pp373 to mean “PyPy 7.3 running python3” to pp36 meaning “PyPy running Python 3.6” (the format is recommended in the PEP). The tag itself was changed in 7.3.0, but older pip versions build their own tag without querying PyPy. This means that wheels built for the previous tag format will not be discovered by pip from this version, so library authors should update their PyPy-specific wheels on PyPI.

Development of PyPy is transitioning to https://foss.heptapod.net/pypy/pypy. This move was covered more extensively in the blog post from last month.

The CFFI backend has been updated to version 14.0. We recommend using CFFI rather than c-extensions to interact with C, and using cppyy for performant wrapping of C++ code for Python. The cppyy backend has been enabled experimentally for win32, try it out and let use know how it works.

Enabling cppyy requires a more modern C compiler, so win32 is now built with MSVC160 (Visual Studio 2019). This is true for PyPy 3.6 as well as for 2.7.

We have improved warmup time by up to 20%, performance of io.StringIO to match if not be faster than CPython, and improved JIT code generation for generators (and generator expressions in particular) when passing them to functions like sum, map, and map that consume them. Performance of closures has also be improved in certain situations.

As always, this release fixed several issues and bugs raised by the growing community of PyPy users. We strongly recommend updating. Many of the fixes are the direct result of end-user bug reports, so please continue reporting issues as they crop up.
You can find links to download the v7.3.1 releases here:
We would like to thank our donors for the continued support of the PyPy project. If PyPy is not quite good enough for your needs, we are available for direct consulting work.

We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on PyPy, or general help with making RPython’s JIT even better. Since the previous release, we have accepted contributions from 13 new contributors, thanks for pitching in.

If you are a Python library maintainer and use c-extensions, please consider making a cffi / cppyy version of your library that would be performant on PyPy. In any case both cibuildwheel and the multibuild system support building wheels for PyPy wheels.

 

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7, 3.6, and soon 3.7. It’s fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

This PyPy release supports:
  • x86 machines on most common operating systems (Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)
  • big- and little-endian variants of PPC64 running Linux,
  • s390x running Linux
  • 64-bit ARM machines running Linux.

What else is new?

For more information about the 7.3.1 release, see the full changelog.

Please update, and continue to help us make PyPy better.

Cheers,
The PyPy team



The PyPy Team 

Tuesday, March 17, 2020

Leysin 2020 Sprint Report

At the end of February ten of us gathered in Leysin, Switzerland to work on
a variety of topics including HPy, PyPy Python 3.7 support and the PyPy
migration to Heptapod.


We had a fun and productive week. The snow was beautiful. There was skiing
and lunch at the top of Berneuse, cooking together, some late nights at
the pub next door, some even later nights coding, and of course the
obligatory cheese fondue outing.

There were a few of us participating in a PyPy sprint for the first time
and a few familiar faces who had attended many sprints. Many different
projects were represented including PyPy, HPy, GraalPython,
Heptapod, and rust-cpython. The atmosphere was relaxed and welcoming, so if
you're thinking of attending the next one -- please do!

Topics worked on:

HPy

HPy is a new project to design and implement a better API for extending
Python in C. If you're unfamiliar with it you can read more about it at
HPy.

A lot of attention was devoted to the Big HPy Design Discussion which
took up two full mornings. So much was decided that this will likely
get its own detailed write-up, but bigger topics included:
  • the HPy GetAttr, SetAttr, GetItem and SetItem methods,
  • HPy_FromVoidP and HPy_AsVoidP for passing HPy handles to C functions
    that pass void* pointers to callbacks,
  • avoiding having va_args as part of the ABI,
  • exception handling,
  • support for creating custom types.
Quite a few things got worked on too:
  • implemented support for writing methods that take keyword arguments with
    HPy_METH_KEYWORDS,
  • implemented HPy_GetAttr, HPy_SetAttr, HPy_GetItem, and HPy_SetItem,
  • started implementing support for adding custom types,
  • started implementing dumping JSON objects in ultrajson-hpy,
  • refactored the PyPy GIL to improve the interaction between HPy and
    PyPy's cpyext,
  • experimented with adding HPy support to rust-cpython.
And there was some discussion of the next steps of the HPy initiative
including writing documentation, setting up websites and funding, and
possibly organising another HPy gathering later in the year.

PyPy

  • Georges gave a presentation on the Heptapod topic and branch workflows
    and showed everyone how to use hg-evolve.
  • Work was done on improving the PyPy CI buildbot post the move to
    heptapod, including a light-weight pre-merge CI and restricting
    when the full CI is run to only branch commits.
  • A lot of work was done improving the -D tests.

Miscellaneous

  • Armin demoed VRSketch and NaN Industries in VR, including an implementation
    of the Game of Life within NaN Industries!
  • Skiing!

Aftermath

Immediately after the sprint large parts of Europe and the world were
hit by the COVID-19 epidemic. It was good to spend time together before
travelling ceased to be a sensible idea and many gatherings were cancelled.

Keep safe out there everyone.

The HPy & PyPy Team & Friends

In joke for those who attended the sprint: Please don't replace this blog post
with its Swedish translation (or indeed a translation to any other language :).

Wednesday, February 1, 2012

Almost There - PyPy's ARM Backend

In this post I want to give an update on the status of the ARM backend for PyPy's JIT and describe some of the issues and details of the backend.

Current Status

It has been a more than a year that I have been working on the ARM backend. Now it is in a shape, that we can measure meaningful numbers and also ask for some feedback. Since the last post about the backend we have added support floating point operations as well as for PyPy's framework GC's. Another area of work was to keep up with the constant improvements done in the main development branch, such as out-of-line guards, labels, etc. It has been possible for about a year to cross-translate the PyPy Python interpreter and other interpreters such as Pyrolog, with a JIT, to run benchmarks on ARM. Up until now there remained some hard to track bugs that would cause the interpreter to crash with a segmentation fault in certain cases when running with the JIT on ARM. Lately it was possible to run all benchmarks without problems, but when running the translation toolchain itself it would crash. During the last PyPy sprint in Leysin Armin and I managed to fix several of these hard to track bugs in the ARM backend with the result that, it is now possible to run the PyPy translator on ARM itself (at least unless until it runs out of memory), which is a kind of litmus test for the backend itself and used to crash before. Just to point it out, we are not able to complete a PyPy translation on ARM, because on the hardware we have currently available there is not enough memory. But up to the point we run out of memory the JIT does not hit any issues.

Implementation Details

The hardware requirements to run the JIT on ARM follow those for Ubuntu on ARM which targets ARMv7 with a VFP unit running in little endian mode. The JIT can be translated without floating point support, but there might be a few places that need to be fixed to fully work in this setting. We are targeting the ARM instruction set, because at least at the time we decided to use it seemed to be the best choice in terms of speed while having some size overhead compared to the Thumb2 instruction set. It appears that the Thumb2 instruction set should give comparable speed with better code density but has a few restriction on the number of registers available and the use of conditional execution. Also the implementation is a bit easier using a fixed width instruction set and we can use the full set of registers in the generated code when using the ARM instruction set.

The calling convention on ARM

The calling convention on ARM uses 4 of the general purpose registers to pass arguments to functions, further arguments are passed on the stack. The presence of a floating point unit is not required for ARM cores, for this reason there are different ways of handling floats with relation to the calling convention. There is a so called soft-float calling convention that is independent of the presence of a floating point unit. For this calling convention floating point arguments to functions are stored in the general purpose registers and on the stack. Passing floats around this way works with software and hardware floating point implementations. But in presence of a floating point unit it produces some overhead, because floating point numbers need to be moved from the floating point unit to the core registers to do a call and moved back to the floating point registers by the callee. The alternative calling convention is the so-called hard-float calling convention which requires the presence of a floating point unit but has the advantage of getting rid of the overhead of moving floating point values around when performing a call. Although it would be better in the long term to support the hard-float calling convention, we need to be able to interoperate with external code compiled for the operating system we are running on. For this reason at the moment we only support the soft-float to interoperate with external code. We implemented and tested the backend on a BeagleBoard-xM with a Cortex-A8 processor running Ubuntu 11.04 for ARM.

Translating for ARM

The toolchain used to translate PyPy currently is based on a Scratchbox2. Scratchbox2 is a cross-compiling environment. Development had stopped for a while, but it seems to have revived again. We run a 32-bit Python interpreter on the host system and perform all calls to the compiler using a Scratchbox2 based environment. A description on how to setup the cross translation toolchain can be found here.

Results

The current results on ARM, as shown in the graph below, show that the JIT currently gives a speedup of about 3.5 times compared to CPython on ARM. The benchmarks were run on the before mentioned BeagleBoard-xM with a 1GHz ARM Cortex-A8 processor and 512MB of memory. The operating system on the board is Ubuntu 11.04 for ARM. We measured the PyPy interpreter with the JIT enabled and disabled comparing each to CPython Python 2.7.1+ (r271:86832) for ARM. The graph shows the speedup or slowdown of both PyPy versions for the different benchmarks from our benchmark suite normalized to the runtime of CPython. The data used for the graph can be seen below.

The speedup is less than the speedup of 5.2 times we currently get on x86 on our own benchmark suite (see http://speed.pypy.org for details). There are several possible reasons for this. Comparing the results for the interpreter without the JIT on ARM and x86 suggests that the interpreter generated by PyPy, without the JIT, has a worse performance when compared to CPython that it does on x86. Also it is quite possible that the code we are generating with the JIT is not yet optimal. Also there are some architectural constraints produce some overhead. One of these differences is the handling of constants, most ARM instructions only support 8 bit (that can be shifted) immediate values, larger constants need to be loaded into a register, something that is not necessary on x86.

BenchmarkPyPy JITPyPy no JIT
ai0.4844397800473.72756749625
chaos0.08072916919342.2908692212
crypto_pyaes0.07111148322453.30112318509
django0.09777432455192.56779947601
fannkuch0.2104237356982.49163632938
float0.1542753346752.12053281495
go0.3304830342025.84628320479
html5lib0.6292643898623.60333138526
meteor-contest0.9847474269122.93838610037
nbody_modified0.2369695930821.40027234936
pyflate-fast0.3674471918072.72472422146
raytrace-simple0.02905274614371.97270054339
richards0.0345755735533.29767342015
slowspitfire0.7866425519083.7397367403
spambayes0.6603243794563.29059863111
spectral-norm0.0636107837314.01788986233
spitfire0.436171311652.72050579076
spitfire_cstringio0.2555387021341.7418593111
telco0.1029189304133.86388866047
twisted_iteration0.1227239868054.33632475491
twisted_names2.423677971352.99878698076
twisted_pb1.309918374314.48877805486
twisted_tcp0.9270333540552.8161624665
waf1.020598119321.03793427321


The next steps and call for help

Although there probably still are some remaining issues which have not surfaced yet, the JIT backend for ARM is working. Before we can merge the backend into the main development line there are some things that we would like to do first, in particular it we are looking for a way to run the all PyPy tests to verify that things work on ARM before we can merge. Additionally there are some other longterm ideas. To do this we are looking for people willing to help, either by contributing to implement the open features or that can help us with hardware to test.

The incomplete list of open topics:
  • We are looking for a better way to translate PyPy for ARM, than the one describe above. I am not sure if there currently is hardware with enough memory to directly translate PyPy on an ARM based system, this would require between 1.5 or 2 Gig of memory. A fully QEMU based approach could also work, instead of Scratchbox2 that uses QEMU under the hood.
  • Test the JIT on different hardware.
  • Experiment with the JIT settings to find the optimal thresholds for ARM.
  • Continuous integration: We are looking for a way to run the PyPy test suite to make sure everything works as expected on ARM, here QEMU also might provide an alternative.
  • A long term plan would be to port the backend to ARMv5 ISA and improve the support for systems without a floating point unit. This would require to implement the ISA and create different code paths and improve the instruction selection depending on the target architecture.
  • Review of the generated machine code the JIT generates on ARM to see if the instruction selection makes sense for ARM.
  • Build a version that runs on Android.
  • Improve the tools, i.e. integrate with jitviewer.
So if you are interested or willing to help in any way contact us.

Thursday, October 15, 2009

First pypy-cli-jit benchmarks

As the readers of this blog already know, I've been working on porting the JIT to CLI/.NET for the last months. Now that it's finally possible to get a working pypy-cli-jit, it's time to do some benchmarks.

Warning: as usual, all of this has to be considered to be a alpha version: don't be surprised if you get a crash when trying to run pypy-cli-jit. Of course, things are improving very quickly so it should become more and more stable as days pass.

For this time, I decided to run four benchmarks. Note that for all of them we run the main function once in advance, to let the JIT recoginizing the hot loops and emitting the corresponding code. Thus, the results reported do not include the time spent by the JIT compiler itself, but give a good measure of how good is the code generated by the JIT. At this point in time, I know that the CLI JIT backend spends way too much time compiling stuff, but this issue will be fixed soon.

  • f1.py: this is the classic PyPy JIT benchmark. It is just a function that does some computational intensive work with integers.
  • floatdemo.py: this is the same benchmark involving floating point numbers that have already been described in a previous blog post.
  • oodemo.py: this is just a microbenchmark doing object oriented stuff such as method calls and attribute access.
  • richards2.py: a modified version of the classic richards.py, with a warmup call before starting the real benchmark.

The benchmarks were run on a Windows machine with an Intel Pentium Dual Core E5200 2.5GHz and 2GB RAM, both with .NET (CLR 2.0) and Mono 2.4.2.3.

Because of a known mono bug, if you use a version older than 2.1 you need to pass the option -O=-branch to mono when running pypy-cli-jit, else it will just loop forever.

For comparison, we also run the same benchmarks with IronPython 2.0.1 and IronPython 2.6rc1. Note that IronPython 2.6rc1 does not work with mono.

So, here are the results (expressed in seconds) with Microsoft CLR:

Benchmark pypy-cli-jit ipy 2.0.1 ipy 2.6 ipy2.01/ pypy ipy2.6/ pypy
f1 0.028 0.145 0.136 5.18x 4.85x
floatdemo 0.671 0.765 0.812 1.14x 1.21x
oodemo 1.25 4.278 3.816 3.42x 3.05x
richards2 1228 442 670 0.36x 0.54x

And with Mono:

Benchmark pypy-cli-jit ipy 2.0.1 ipy2.01/ pypy
f1 0.042 0.695 16.54x
floatdemo 0.781 1.218 1.55x
oodemo 1.703 9.501 5.31x
richards2 720 862 1.20x

These results are very interesting: under the CLR, we are between 5x faster and 3x slower than IronPython 2.0.1, and between 4.8x faster and 1.8x slower than IronPython 2.6. On the other hand, on mono we are consistently faster than IronPython, up to 16x. Also, it is also interesting to note that pypy-cli runs faster on CLR than mono for all benchmarks except richards2.

I've not investigated yet, but I think that the culprit is the terrible behaviour of tail calls on CLR: as I already wrote in another blog post, tail calls are ~10x slower than normal calls on CLR, while being only ~2x slower than normal calls on mono. richads2 is probably the benchmark that makes most use of tail calls, thus explaining why we have a much better result on mono than CLR.

The next step is probably to find an alternative implementation that does not use tail calls: this probably will also improve the time spent by the JIT compiler itself, which is not reported in the numbers above but that so far it is surely too high to be acceptable. Stay tuned.

Tuesday, April 21, 2009

Roadmap for JIT

Hello.

First a disclaimer. This post is more about plans for future than current status. We usually try to write about things that we have done, because it's much much easier to promise things than to actually make it happen, but I think it's important enough to have some sort of roadmap.

In recent months we came to the point where the 5th generation of JIT prototype was working as nice or even a bit nicer than 1st one back in 2007. Someone might ask "so why did you spend all this time without going forward?". And indeed, we spend a lot of time moving sideways, but as posted, we also spent a lot of time doing some other things, which are important as well. The main advantage of current JIT incarnation is much much simpler than the first one. Even I can comprehend it, which is much of an improvement :-)

So, the prototype is working and gives very nice speedups in range of 20-30x over CPython. We're pretty confident this prototype will work and will produce fast python interpreter eventually. So we decided that now we'll work towards changing prototype into something stable and solid. This might sound easy, but in fact it's not. Having stable assembler backend and optimizations that keep semantics is not as easy as it might sound.

The current roadmap, as I see it, looks like as following:

  • Provide a JIT that does not speedup things, but produce assembler without optimizations turned on, that is correct and able to run CPython's library tests on a nightly basis.
  • Introduce simple optimizations, that should make above JIT a bit faster than CPython. With optimizations disabled JIT is producing incredibly dumb assembler, which is slower than correspoding C code, even with removal of interpretation overhead (which is not very surprising).
  • Backport optimizations from JIT prototype, one by one, keeping an eye on how they perform and making sure they don't break anything.
  • Create new optimizations, like speeding up attribute access.
  • Profit.

This way, we can hopefully provide a working JIT, which gives fast python interpreter, which is a bit harder than just a nice prototype.

Tell us what you think about this plan.

Cheers,
fijal & others.

Thursday, July 10, 2008

EP2008: PyPy meets Jython

One of the great events at EuroPython 2008 were our chats and meetings with the Jython and Sun people. The Jython people recently are pushing into releasing Python version 2.5 and they currently pursue many interesting sub projects. Coincidentally, PyPy also has tons of interesting areas and results :) So we eventually got into brainstorming a number of possible technical collab ideas. Further below is a first list as i wrote it down from our 10 people PyPy / Jython 30 minute close up meeting yesterday. It felt great to be able to talk to the Jython people this way - kudos to Sun for their clear commitments and open ways to go about things! I sense a genuine interest on fair collaboration with non-java developer communities. Seems like they are serious about not focusing on "Java this", "Java that" anymore but rather focus on the JVM platform. Good! And about language independent interest in ambitious technology. Even Better! I am tensed to see how things go from here. So here the list of technical collab ideas:
  • ctypes - try to create _rawffi module in Java for Jython, which will enable Jython to reuse our existing ctypes implementation (and have PyPy use the Jython-rawffi for its own for PyPy.JVM)
  • generally see to share work / (continue) collaborate regarding extension modules
  • Jython/PyPy (and eventually IronPython): document known differences to CPython, maybe in a PEP
  • Python Interpreter for Jython (in order to run CPython's .pyc files): re-use pypy's bytecode evaluator, implement a "Jython object space".
  • re-use rpython-extension modules for jython (e.g. SRE), by compiling them to Java and reusing as a native library.
  • collaborate on testing framework / benchmarking, have a common site to show test results
  • make py.test compatible with jython
  • come up with a set of "pure Python language" tests, which would gather and refactor tests from CPython, PyPy and Jython.
  • look into using java types / jython approaches for implementing free threading.
  • share knowledge regarding JIT / psyco
If you have any more ideas, comments or would like to join efforts, let us know! Cheers and thanks to Ted Leung, Frank Wierzbiki, Jim Baker and Tobias Ivarsson from Sun and Jython fame respectively, Holger