Adding the blog posts on pypy.org seems like a natural extension of the web site
rather than outsourcing it to a third-party. Since the site is generated using
the static site generator nikola from the github repo
https://github.com/pypy/pypy.org, we now have good source control for the
content.
CI previews, and github
Those of you who follow PyPy may note something new in the URL for the repo:
until now PyPy has been using mercurial as hosted
on https://foss.heptapod.net. While heptapod (a
community driven effort to bring mercurial support to GitLab™) does provide a
GitLab CI runner for the open source offering, it is easier to use netlify for
previews on github. Hopefully the move to the more popular github platform will encourage
new contributors to publish their success stories around using PyPy and
the RPython toolchain.
Comments
Comments to blog posts are generated via the utterances
javascript plugin. The comments appear as issues in the repo.
When viewing the site, a query is made to fetch the comments to the issue with
that name. To comment, users must authorize the utterances app to post on their
behalf using the GitHub
OAuth flow.
Alternatively, users can comment on the GitHub issue directly. The interaction
with github for authentication and moderation seems more natural than the
manual moderation required on blogspot.
Please prove to us that the move is worth it
Help us with guest blog posts, and PRs to improve the styling of the site. One
already open issue is that the navbar needlessly uses javascript, help to keep
the responsive style in pure CSS is welcome.
Apple now ships Macs which are running on an arm64 variant machine with the
latest version of MacOS, Big Sur M1. We are getting requests for PyPy to
support this new architecture. Here is our position on this topic (or at least
mine, Armin Rigo's), and how you can help.
Porting PyPy is harder than just re-running the compiler, because PyPy contains
a few big architecture-dependent "details", like the JIT compiler and the
foreign function interfaces (CFFI and ctypes).
Fixing the JIT compiler should not be too much work: we already support arm64,
just the Linux one. But Apple made various details different (like the calling
conventions). A few other parts need to be fixed too, notably CFFI and ctypes,
again because of the calling conventions.
Fixing that would be a reasonable amount of work. I would do it myself for a
small amount of money. However, the story doesn't finish here. Obviously, the
start of the story would be to get ssh access to a Big Sur M1 machine. (If at
this point you're thinking "sure, I can give you ssh access for three months",
then please read on.) The next part of the story is that we need a machine
available long term. It can be either a machine provided and maintained by a
third party, or alternatively a pot of money big enough to support the
acquision of a machine and ongoing work of one of us.
If we go with the provided-machine solution: What we need isn't a lot of
resources. Our CI requires maybe 10 GB of disk space, and a few hours of CPU
per run. It should fit into 8 GB of RAM. We normally do a run every night but
we can certainly lower the frequency a bit if that would help. However, we'd
ideally like some kind of assurance that you are invested into maintaining the
machine for the next 3-5 years (I guess, see below). We had far too many
machines that disappeared after a few months.
If we go with the money-supported solution: it's likely that after 3-5 years
the whole Mac base will have switched to arm64, we'll drop x86-64 support for
Mac, and we'll be back to the situation of the past where there was only one
kind of Mac machine to care about. In the meantime, we are looking at 3-5
years of lightweight extra maintenance. We have someone that has said he would
do it, but not for free.
If either of these two solutions occurs, we'll still have, I quote, "probably
some changes in distutils-type stuff to make python happy", and then some
packaging/deployment changes to support the "universal2" architecture, i.e.
including both versions inside a single executable (which will not be just an
extra switch to clang, because the two versions need a different JIT backend
and so must be translated separately).
So, now all the factors are on the table. We won't do the minimal "just the
JIT compiler fixes" if we don't have a plan that goes farther. Either we get
sufficient money, and maybe support, and then we can do it quickly; or PyPy
will just remain not natively available on M1 hardware for the next 3-5 years.
We are looking forward to supporting M1, and view resources contributed by
the community as a vote of confidence in assuring the future of PyPy on this
hardware. Contact us: pypy-dev@python.org, or our private mailing
list pypy-z@python.org.
The PyPy team is proud to release the version 7.3.3 of PyPy, which includes
three different interpreters:
PyPy2.7, which is an interpreter supporting the syntax and the features of
Python 2.7 including the stdlib for CPython 2.7.18 (updated from the
previous version)
PyPy3.6: which is an interpreter supporting the syntax and the features of
Python 3.6, including the stdlib for CPython 3.6.12 (updated from the
previous version).
PyPy3.7 beta: which is our second release of an interpreter supporting the
syntax and the features of Python 3.7, including the stdlib for CPython
3.7.9. We call this beta quality software, there may be issues about
compatibility with new and changed features in CPython 3.7.
Please let us know what is broken or missing. We have not implemented the
documented changes in the re module, and a few other pieces are also
missing. For more information, see the PyPy 3.7 wiki page
The interpreters are based on much the same codebase, thus the multiple
release. This is a micro release, all APIs are compatible with the 7.3
releases, but read on to find out what is new.
Several issues found in the 7.3.2 release were fixed. Many of them came from the
great work by conda-forge to ship PyPy binary packages. A big shout out
to them for taking this on.
Development of PyPy has moved to https://foss.heptapod.net/pypy/pypy.
This was covered more extensively in this blog post. We have seen an
increase in the number of drive-by contributors who are able to use gitlab +
mercurial to create merge requests.
The CFFI backend has been updated to version 1.14.3. We recommend using CFFI
rather than c-extensions to interact with C, and using cppyy for performant
wrapping of C++ code for Python.
A new contributor took us up on the challenge to get windows 64-bit support.
The work is proceeding on the win64 branch, more help in coding or
sponsorship is welcome. In anticipation of merging this large change, we fixed
many test failures on windows.
As always, this release fixed several issues and bugs. We strongly recommend
updating. Many of the fixes are the direct result of end-user bug reports, so
please continue reporting issues as they crop up.
You can find links to download the v7.3.3 releases here:
We would like to thank our donors for the continued support of the PyPy
project. If PyPy is not quite good enough for your needs, we are available for
direct consulting work.
We would also like to thank our contributors and encourage new people to join
the project. PyPy has many layers and we need help with all of them: PyPy
and RPython documentation improvements, tweaking popular modules to run
on pypy, or general help with making RPython’s JIT even better. Since the
previous release, we have accepted contributions from 2 new contributors,
thanks for pitching in.
If you are a python library maintainer and use c-extensions, please consider
making a cffi / cppyy version of your library that would be performant on PyPy.
In any case both cibuildwheel and the multibuild system support
building wheels for PyPy.
What is PyPy?
PyPy is a Python interpreter, a drop-in replacement for CPython 2.7, 3.6, and
3.7. It’s fast (PyPy and CPython 3.7.4 performance
comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython
can do for them.
This PyPy release supports:
x86 machines on most common operating systems
(Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)
big- and little-endian variants of PPC64 running Linux,
s390x running Linux
64-bit ARM machines running Linux.
PyPy does support ARM 32 bit processors, but does not release binaries.
What else is new?
For more information about the 7.3.3 release, see the full changelog.
Please update, and continue to help us make PyPy better.
The PyPy team is proud to release version 7.3.2 of PyPy, which includes
three different interpreters:
PyPy2.7, which is an interpreter supporting the syntax and the features of
Python 2.7 including the stdlib for CPython 2.7.13
PyPy3.6: which is an interpreter supporting the syntax and the features of
Python 3.6, including the stdlib for CPython 3.6.9.
PyPy3.7 alpha: which is our first release of an interpreter supporting the
syntax and the features of Python 3.7, including the stdlib for CPython
3.7.9. We call this an alpha release since it is our first. It is based off PyPy 3.6 so
issues should be around compatibility and not stability. Please try it out
and let us know what is broken or missing. We have not implemented some of the
documented changes in the re module, and other pieces are also
missing. For more information, see the PyPy 3.7 wiki page
The interpreters are based on much the same codebase, thus the multiple
release. This is a micro release, all APIs are compatible with the 7.3.0 (Dec
2019) and 7.3.1 (April 2020) releases, but read on to find out what is new.
Conda Forge now supports PyPy as a python interpreter. The support is quite
complete for linux and macOS. This is the result of a lot of
hard work and good will on the part of the Conda Forge team. A big shout out
to them for taking this on.
Development of PyPy has transitioning to https://foss.heptapod.net/pypy/pypy.
This move was covered more extensively in this blog post. We have seen an
increase in the number of drive-by contributors who are able to use gitlab +
mercurial to create merge requests.
The CFFI backend has been updated to version 1.14.2. We recommend using CFFI
rather than c-extensions to interact with C, and using cppyy for performant
wrapping of C++ code for Python.
NumPy has begun shipping wheels on PyPI for PyPy, currently for linux 64-bit
only. Wheels for PyPy windows will be available from the next NumPy release. Thanks to NumPy for their support.
A new contributor took us up on the challenge to get windows 64-bit support.
The work is proceeding on the win64 branch, more help in coding or
sponsorship is welcome.
As always, this release fixed several issues and bugs. We strongly recommend
updating. Many of the fixes are the direct result of end-user bug reports, so
please continue reporting issues as they crop up.
You can find links to download the v7.3.2 releases here:
We would like to thank our donors for the continued support of the PyPy
project. Please help support us at Open Collective. If PyPy is not yet good enough for your needs, we are available for
direct consulting work.
We would also like to thank our contributors and encourage new people to join
the project. PyPy has many layers and we need help with all of them: PyPy
and RPython documentation improvements, tweaking popular modules to run
on pypy, or general help with making RPython’s JIT even better. Since the
previous release, we have accepted contributions from 8 new contributors,
thanks for pitching in.
If you are a python library maintainer and use c-extensions, please consider
making a cffi / cppyy version of your library that would be performant on PyPy.
In any case both cibuildwheel and the multibuild system support
building wheels for PyPy.
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7, 3.6, and 3.7. It’s fast (PyPy and CPython 2.7.x performance
comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython
can do for them.
This PyPy release supports:
x86 machines on most common operating systems
(Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)
big- and little-endian variants of PPC64 running Linux,
s390x running Linux
64-bit ARM machines running Linux.
PyPy does support ARM 32 bit processors, but does not release binaries.
What else is new?
For more information about the 7.3.2 release, see the full changelog.
Please update, and continue to help us make PyPy better.
PyPy is now a member of Open Collective, a fiscal host. We have been thinking about switching to this organization for a couple of years; we like it for various reasons, like the budget transparency and the lightweight touch. We can now officially announce our membership!
With this, we are now again free to use PyPy for all financial issues, like receiving funds professionally, paying parts of sprint budgets as we like, and so on. We will shortly be reintroducing buttons that link to Open Collective from the PyPy web site.
Although the old donation buttons were removed last year, we believe that there are still a few people that send regularly money to the SFC, the not-for-profit charity we were affiliated with. If you do, please stop doing it now (and, if you like to do so, please set up an equivalent donation to PyPy on Open Collective).
And by the way, sorry for all of you who were getting mixed feelings from the previous blog post (co-written with the SFC). PyPy is committed to continue being Open Source just like before. This was never in question. What these two blog posts mean is only that we switched to a different organization for our internal finances.
We're looking forward to how this new relationship will go!
PyPy winds down its membership in the Software Freedom Conservancy
Conservancy and PyPy's great work together
PyPy joined Conservancy in
the second half of 2010, shortly after the release of
PyPy 1.2, the first version to contain a fully functional JIT. In 2013, PyPy
started supporting ARM, bringing its just-in-time speediness to many more devices and began working toward supporting NumPy to help
scientists crunch their numbers faster. Together, PyPy and Conservancy ran successful fundraising drives and facilitated payment
and oversight for contractors and code sprints.
Conservancy supported PyPy's impressive growth as it expanded support for
different hardware platforms, greatly improved the performance of C extensions,
and added support for Python 3 as the language itself evolved.
The road ahead
Conservancy provides a fiscal and organizational home for projects that find the
freedoms and guardrails that come along with a charitable home advantageous for
their community goals. While this framework was a great fit for the early PyPy
community, times change and all good things must come to an end.
PyPy will remain a free and open source project, but the community's structure
and organizational underpinnings will be changing and the PyPy community will be
exploring options outside of the charitable realm for its next phase of growth
("charitable" in the legal sense -- PyPy will remain a community project).
During the last year PyPy and Conservancy have worked together to properly
utilise the generous donations made by stalwart PyPy enthusiats over the years
and to wrap up PyPy's remaining charitable obligations. PyPy is grateful for
the Conservancy's help in shepherding the project toward its next chapter.
Thank yous
From Conservancy:
"We are happy that Conservancy was able to help PyPy bring important software
for the public good during a critical time in its history. We wish the
community well and look forward to seeing it develop and succeed in new ways."
— Karen Sandler, Conservancy's Executive Director
From PyPy:
"PyPy would like to thank Conservancy for their decade long support in
building the community and wishes Conservancy continued success in their
journey promoting, improving, developing and defending free and open source
sofware."
— Simon Cross & Carl Friedrich Bolz-Tereick, on behalf of PyPy.
About
PyPy is a multi-layer python interpreter with a built-in JIT compiler that runs
Python quickly across different computing environments.
Software Freedom Conservancy (Conservancy) is a charity that provides a home
to over forty free and open source software projects.
The PyPy team is proud to release the version 7.3.1 of PyPy, which includes
two different interpreters:
PyPy2.7, which is an interpreter supporting the syntax and the features of
Python 2.7 including the stdlib for CPython 2.7.13
PyPy3.6: which is an interpreter supporting the syntax and the features of
Python 3.6, including the stdlib for CPython 3.6.9.
The interpreters are based on much the same codebase, thus the multiple
release. This is a micro release, no APIs have changed since the 7.3.0 release
in December, but read on to find out what is new.
Conda Forge now supports PyPy as a Python interpreter. The support right now
is being built out. After this release, many more c-extension-based
packages can be successfully built and uploaded. This is the result of a lot of
hard work and good will on the part of the Conda Forge team. A big shout out
to them for taking this on.
We have worked with the Python packaging group to support tooling around
building third party packages for Python, so this release updates the pip and
setuptools installed when executing pypy-mensurepip to pip>=20. This
completes the work done to update the PEP 425 python tag from pp373 to
mean “PyPy 7.3 running python3” to pp36 meaning “PyPy running Python
3.6” (the format is recommended in the PEP). The tag itself was
changed in 7.3.0, but older pip versions build their own tag without querying
PyPy. This means that wheels built for the previous tag format will not be
discovered by pip from this version, so library authors should update their
PyPy-specific wheels on PyPI.
The CFFI backend has been updated to version 14.0. We recommend using CFFI
rather than c-extensions to interact with C, and using cppyy for performant
wrapping of C++ code for Python. The cppyy backend has been enabled
experimentally for win32, try it out and let use know how it works.
Enabling cppyy requires a more modern C compiler, so win32 is now built
with MSVC160 (Visual Studio 2019). This is true for PyPy 3.6 as well as for 2.7.
We have improved warmup time by up to 20%, performance of io.StringIO to
match if not be faster than CPython, and improved JIT code generation for
generators (and generator expressions in particular) when passing them to
functions like sum, map, and map that consume them. Performance of closures has also be improved in certain situations.
As always, this release fixed several issues and bugs raised by the growing
community of PyPy users. We strongly recommend updating. Many of the fixes are
the direct result of end-user bug reports, so please continue reporting issues
as they crop up.
You can find links to download the v7.3.1 releases here:
We would like to thank our donors for the continued support of the PyPy
project. If PyPy is not quite good enough for your needs, we are available for
direct consulting work.
We would also like to thank our contributors and encourage new people to join
the project. PyPy has many layers and we need help with all of them: PyPy
and RPython documentation improvements, tweaking popular modules to run
on PyPy, or general help with making RPython’s JIT even better. Since the
previous release, we have accepted contributions from 13 new contributors,
thanks for pitching in.
If you are a Python library maintainer and use c-extensions, please consider
making a cffi / cppyy version of your library that would be performant on PyPy.
In any case both cibuildwheel and the multibuild system support
building wheels for PyPy wheels.
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7, 3.6, and soon 3.7. It’s fast (PyPy and CPython 2.7.x performance
comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython
can do for them.
This PyPy release supports:
x86 machines on most common operating systems
(Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)
big- and little-endian variants of PPC64 running Linux,
s390x running Linux
64-bit ARM machines running Linux.
What else is new?
For more information about the 7.3.1 release, see the full changelog.
Please update, and continue to help us make PyPy better.
At the end of February ten of us gathered in Leysin, Switzerland to work on
a variety of topics including HPy, PyPy Python 3.7 support and the PyPy
migration to Heptapod.
We had a fun and productive week. The snow was beautiful. There was skiing
and lunch at the top of Berneuse, cooking together, some late nights at
the pub next door, some even later nights coding, and of course the
obligatory cheese fondue outing.
There were a few of us participating in a PyPy sprint for the first time
and a few familiar faces who had attended many sprints. Many different
projects were represented including PyPy, HPy, GraalPython, Heptapod, and rust-cpython. The atmosphere was relaxed and welcoming, so if
you're thinking of attending the next one -- please do!
Topics worked on:
HPy
HPy is a new project to design and implement a better API for extending
Python in C. If you're unfamiliar with it you can read more about it at HPy.
A lot of attention was devoted to the Big HPy Design Discussion which
took up two full mornings. So much was decided that this will likely
get its own detailed write-up, but bigger topics included:
the HPy GetAttr, SetAttr, GetItem and SetItem methods,
HPy_FromVoidP and HPy_AsVoidP for passing HPy handles to C functions
that pass void* pointers to callbacks,
avoiding having va_args as part of the ABI,
exception handling,
support for creating custom types.
Quite a few things got worked on too:
implemented support for writing methods that take keyword arguments with
HPy_METH_KEYWORDS,
implemented HPy_GetAttr, HPy_SetAttr, HPy_GetItem, and HPy_SetItem,
started implementing support for adding custom types,
started implementing dumping JSON objects in ultrajson-hpy,
refactored the PyPy GIL to improve the interaction between HPy and
PyPy's cpyext,
experimented with adding HPy support to rust-cpython.
And there was some discussion of the next steps of the HPy initiative
including writing documentation, setting up websites and funding, and
possibly organising another HPy gathering later in the year.
PyPy
Georges gave a presentation on the Heptapod topic and branch workflows
and showed everyone how to use hg-evolve.
Work was done on improving the PyPy CI buildbot post the move to
heptapod, including a light-weight pre-merge CI and restricting
when the full CI is run to only branch commits.
A lot of work was done improving the -D tests.
Miscellaneous
Armin demoed VRSketch and NaN Industries in VR, including an implementation
of the Game of Life within NaN Industries!
Skiing!
Aftermath
Immediately after the sprint large parts of Europe and the world were
hit by the COVID-19 epidemic. It was good to spend time together before
travelling ceased to be a sensible idea and many gatherings were cancelled.
Keep safe out there everyone.
The HPy & PyPy Team & Friends
In joke for those who attended the sprint: Please don't replace this blog post
with its Swedish translation (or indeed a translation to any other language :).
It has been a very busy month, not so much because of deep changes in the JIT of PyPy but more around the development, deployment, and packaging of the project.
Hosting
The biggest news is that we have moved the center of our development off Bitbucket and to the new https://foss.heptapod.net/pypy. This is a friendly fork of Gitlab called heptapod that understands Mercurial and is hosted by Clever Cloud. When Atlassian decided to close down Mercurial hosting on bitbucket.org, PyPy debated what to do. Our development model is based on long-lived branches, and we want to keep the ability to immediately see which branch each commit came from. Mercurial has this, git does not (see our FAQ). Octobus, whose business is Mercurial, developed a way to use Mercurial with Gitlab called heptapod. The product is still under development, but quite usable (i.e., it doesn't get in the way). Octobus partnered with Clever Cloud hosting to offer community FOSS projects hosted on Bitbucket who wish to remain with Mercurial a new home. PyPy took them up on the offer, and migrated its repos to https://foss.heptapod.net/pypy. We were very happy with how smooth it was to import the repos to heptapod/GitLab, and are learning the small differences between Bitbucket and GitLab. All the pull requests, issues, and commits kept the same ids, but work is still being done to attribute the issues, pull requests, and comments to the correct users. So from now on, when you want to contribute to PyPy, you do so at the new home.
Secondly, thanks to work by https://baroquesoftware.com/ in leading a redesign and updating the logo, the https://www.pypy.org website has undergone a facelift. It should now be easier to use on small-screen devices. Thanks also to the PSF for hosting the site.
Packaging
Also, building PyPy from source takes a fair amount of time. While we provide downloads in the form of tarballs or zipfiles, and some platforms such as debian and Homebrew provide packages, traditionally the downloads have only worked on a specific flavor of operating system. A few years ago squeaky-pl started providing portable builds. We have adopted that build system for our linux offerings, so the nightly downloads and release downloads should now work on any glibc platform that has not gone EndOfLife. So there goes another excuse not to use PyPy. And the "but does it run scipy" excuse also no longer holds, although "does it speed up scipy" still has the wrong answer. For that we are working on HPy, and will be sprinting soon.
The latest versions of pip, wheel, and setuptools, together with the manylinux2010 standard for linux wheels and tools such as multibuild or cibuildwheels (well, from the next version) make it easier for library developers to build binary wheels for PyPy. If you are having problems getting going with this, please reach out.
Give it a try
Thanks to all the folks who provide the infrastructure PyPy depends on. We hope the new look will encourage more involvement and engagement. Help prove us right!
The next PyPy sprint will be in Leysin, Switzerland, for the fourteenth
time. This is a fully public sprint: newcomers and topics other than
those proposed below are welcome.
Goals and topics of the sprint
The list of topics is open. For reference, we would like to work at least partially on the following topics:
As usual, the main side goal is to have fun in winter sports :-)
We can take a day off (for ski or anything else).
Times and accomodation
The sprint will occur for one week starting on Saturday, the 29th of February, to Sunday, the 8th of March 2020 (dates were pushed back one day!) It will occur in Les Airelles, a different bed-and-breakfast place from the traditional one in Leysin. It is a nice old house at the top of the village.
We have a 4- or 5-people room as well as up to three double-rooms. Please register early! These rooms are not booked for the sprint in advance, and might be already taken if you end up announcing yourself late. We have a big room for up to 7 people with nice view, which might be split in two or three sub-rooms; plus possibly separately-booked double rooms if needed. (But it is of course always possible to book at a different place in Leysin.)
For more information, see our repository or write to me directly at armin.rigo@gmail.com.
The PyPy team is proud to release the version 7.3.0 of PyPy, which includes
two different interpreters:
PyPy2.7, which is an interpreter supporting the syntax and the features of
Python 2.7 including the stdlib for CPython 2.7.13
PyPy3.6: which is an interpreter supporting the syntax and the features of
Python 3.6, including the stdlib for CPython 3.6.9.
The interpreters are based on much the same codebase, thus the double
release.
We have worked with the python packaging group to support tooling around
building third party packages for python, so this release changes the ABI tag
for PyPy.
Based on the great work done in portable-pypy, the linux downloads we
provide are now built on top of the manylinux2010 CentOS6 docker image.
The tarballs include the needed shared objects to run on any platform that
supports manylinux2010 wheels, which should include all supported versions of
debian- and RedHat-based distributions (including Ubuntu, CentOS, and Fedora).
The CFFI backend has been updated to version 1.13.1. We recommend using CFFI
rather than c-extensions to interact with C.
The built-in cppyy module was upgraded to 1.10.6, which
provides, among others, better template resolution, stricter enum handling,
anonymous struct/unions, cmake fragments for distribution, optimizations for
PODs, and faster wrapper calls. We reccomend using cppyy for performant
wrapping of C++ code for Python.
The vendored pyrepl package for interaction inside the REPL was updated.
Support for codepage encoding and decoding was added for Windows.
As always, this release fixed several issues and bugs raised by the growing
community of PyPy users. We strongly recommend updating. Many of the fixes are
the direct result of end-user bug reports, so please continue reporting issues
as they crop up.
You can download the v7.3 releases here:
We would like to thank our donors for the continued support of the PyPy
project. If PyPy is not quite good enough for your needs, we are available for
direct consulting work.
We would also like to thank our contributors and encourage new people to join
the project. PyPy has many layers and we need help with all of them: PyPy
and RPython documentation improvements, tweaking popular packages to run
on pypy, or general help with making RPython’s JIT even better. Since the
previous release, we have accepted contributions from 3 new contributors,
thanks for pitching in.
If you are a python library maintainer and use c-extensions, please consider making a cffi / cppyy version of your library that would be performant on PyPy. If you are stuck with using the C-API, you can use docker images with PyPy built in or the multibuild system to build wheels.
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7, 3.6. It’s fast (PyPy and CPython 2.7.x performance
comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython
can do for them.
This PyPy release supports:
x86 machines on most common operating systems
(Linux 32/64 bit, Mac OS X 64-bit, Windows 32-bit, OpenBSD, FreeBSD)
big- and little-endian variants of PPC64 running Linux
s390x running Linux
64-bit ARM machines running Linux
Unfortunately at the moment of writing our ARM buildbots are out of service,
so for now we are not releasing any binary for the ARM architecture (32-bit), although PyPy does support ARM 32-bit processors.
What else is new?
PyPy 7.2 was released in October, 2019.
There are many incremental improvements to RPython and PyPy, For more information about the 7.3.0 release, see the full changelog.
Please update, and continue to help us make PyPy better.
Recently Antonio, Armin and Ronan had a small internal sprint in the beautiful
city of Gdańsk to kick-off the development of HPy. Here is a brief report of
what was accomplished during the sprint.
What is HPy?
The TL;DR answer is "a better way to write C extensions for Python".
The idea of HPy was born during EuroPython 2019 in Basel, where there was an
informal meeting which included core developers of PyPy, CPython (Victor
Stinner and Mark Shannon) and Cython (Stefan Behnel). The ideas were later also
discussed with Tim Felgentreff of GraalPython, to make sure they would also be
applicable to this very different implementation, Windel Bouwman of RustPython
is following the project as well.
All of us agreed that the current design of the CPython C API is problematic
for various reasons and, in particular, because it is too tied to the current
internal design of CPython. The end result is that:
alternative implementations of Python (such as PyPy, but not only) have a
hard time loading and executing existing C extensions;
CPython itself is unable to change some of its internal implementation
details without breaking the world. For example, as of today it would be
impossible to switch from using reference counting to using a real GC,
which in turns make it hard for example to remove the GIL, as gilectomy
attempted.
HPy tries to address these issues by following two major design guidelines:
objects are referenced and passed around using opaque handles, which are
similar to e.g., file descriptors in spirit. Multiple, different handles
can point to the same underlying object, handles can be duplicated and
each handle must be released independently of any other duplicate.
The internal data structures and C-level layout of objects are not
visible nor accessible using the API, so each implementation if free to
use what fits best.
The other major design goal of HPy is to allow incremental transition and
porting, so existing modules can migrate their codebase one method at a time.
Moreover, Cython is considering to optionally generate HPy code, so extension
module written in Cython would be able to benefit from HPy automatically.
More details can be found in the README of the official HPy repository.
Target ABI
When compiling an HPy extension you can choose one of two different target ABIs:
HPy/CPython ABI: in this case, hpy.h contains a set of macros and
static inline functions. At compilation time this translates the HPy API
into the standard C-API. The compiled module will have no performance
penalty, and it will have a "standard" filename like
foo.cpython-37m-x86_64-linux-gnu.so.
Universal HPy ABI: as the name implies, extension modules compiled
this way are "universal" and can be loaded unmodified by multiple Python
interpreters and versions. Moreover, it will be possible to dynamically
enable a special debug mode which will make it easy to find e.g., open
handles or memory leaks, without having to recompile the extension.
Universal modules can also be loaded on CPython, thanks to the
hpy_universal module which is under development. An extra layer of
indirection enables loading extensions compiled with the universal ABI. Users
of hpy_universal will face a small performance penalty compared to the ones
using the HPy/CPython ABI.
This setup gives several benefits:
Extension developers can use the extra debug features given by the
Universal ABI with no need to use a special debug version of Python.
Projects which need the maximum level of performance can compile their
extension for each relevant version of CPython, as they are doing now.
Projects for which runtime speed is less important will have the choice of
distributing a single binary which will work on any version and
implementation of Python.
A simple example
The HPy repo contains a proof of concept module. Here is a simplified
version which illustrates what a HPy module looks like:
#include"hpy.h"HPy_DEF_METH_VARARGS(add_ints)staticHPyadd_ints_impl(HPyContextctx,HPyself,HPy*args,HPy_ssize_tnargs){longa,b;if(!HPyArg_Parse(ctx,args,nargs,"ll",&a,&b))returnHPy_NULL;returnHPyLong_FromLong(ctx,a+b);}staticHPyMethodDefPofMethods[]={{"add_ints",add_ints,HPy_METH_VARARGS,""},{NULL,NULL,0,NULL}};staticHPyModuleDefmoduledef={HPyModuleDef_HEAD_INIT,.m_name="pof",.m_doc="HPy Proof of Concept",.m_size=-1,.m_methods=PofMethods};HPy_MODINIT(pof)staticHPyinit_pof_impl(HPyContextctx){HPym;m=HPyModule_Create(ctx,&moduledef);if(HPy_IsNull(m))returnHPy_NULL;returnm;}
People who are familiar with the current C-API will surely notice many
similarities. The biggest differences are:
Instead of PyObject *, objects have the type HPy, which as
explained above represents a handle.
You need to explicitly pass an HPyContext around: the intent is
primary to be future-proof and make it easier to implement things like
sub- interpreters.
HPy_METH_VARARGS is implemented differently than CPython's
METH_VARARGS: in particular, these methods receive an array of HPy
and its length, instead of a fully constructed tuple: passing a tuple
makes sense on CPython where you have it anyway, but it might be an
unnecessary burden for alternate implementations. Note that this is
similar to the new METH_FASTCALL which was introduced in CPython.
HPy relies a lot on C macros, which most of the time are needed to support
the HPy/CPython ABI compilation mode. For example, HPy_DEF_METH_VARARGS
expands into a trampoline which has the correct C signature that CPython
expects (i.e., PyObject (*)(PyObject *self, *PyObject *args)) and
which calls add_ints_impl.
Sprint report and current status
After this long preamble, here is a rough list of what we accomplished during
the week-long sprint and the days immediatly after.
On the HPy side, we kicked-off the code in the repo: at the moment of writing
the layout of the directories is a bit messy because we moved things around
several times, but we identified several main sections:
A specification of the API which serves both as documentation and as an
input for parts of the projects which are automatically
generated. Currently, this lives in public_api.h.
A set of header files which can be used to compile extension modules:
depending on whether the flag -DHPY_UNIVERSAL_ABI is passed to the
compiler, the extension can target the HPy/CPython ABI or the HPy
Universal ABI
A CPython extension module called hpy_universal which makes it
possible to import universal modules on CPython
A set of tests which are independent of the implementation and are meant
to be an "executable specification" of the semantics. Currently, these
tests are run against three different implementations of the HPy API:
the headers which implements the "HPy/CPython ABI"
the hpy_universal module for CPython
the hpy_universal module for PyPy (these tests are run in the PyPy repo)
Moreover, we started a PyPy branch in which to implement the
hpy_univeral module: at the moment of writing PyPy can pass all the HPy
tests apart the ones which allow conversion to and from PyObject *.
Among the other things, this means that it is already possible to load the
very same binary module in both CPython and PyPy, which is impressive on its
own :).
Finally, we wanted a real-life use case to show how to port a module to HPy
and to do benchmarks. After some searching, we choose ultrajson, for the
following reasons:
it is a real-world extension module which was written with performance in
mind
when parsing a JSON file it does a lot of calls to the Python API to
construct the various parts of the result message
ujson_hpy is also a very good example of incremental migration: so far
only ujson.loads is implemented using the HPy API, while ujson.dumps
is still implemented using the old C-API, and both can coexist nicely in the
same compiled module.
Benchmarks
Once we have a fully working ujson_hpy module, we can finally run
benchmarks! We tested several different versions of the module:
ujson: this is the vanilla implementation of ultrajson using the
C-API. On PyPy this is executed by the infamous cpyext compatibility
layer, so we expect it to be much slower than on CPython
ujson_hpy: our HPy port compiled to target the HPy/CPython ABI. We
expect it to be as fast as ujson
ujson_hpy_universal: same as above but compiled to target the
Universal HPy ABI. We expect it to be slightly slower than ujson on
CPython, and much faster on PyPy.
Finally, we also ran the benchmark using the builtin json module. This is
not really relevant to HPy, but it might still be an interesting as a
reference data point.
The benchmark is very simple and consists of parsing a big JSON file 100
times. Here is the average time per iteration (in milliseconds) using the
various versions of the module, CPython 3.7 and the latest version of the hpy
PyPy branch:
CPython
PyPy
ujson
154.32
633.97
ujson_hpy
152.19
ujson_hpy_universal
168.78
207.68
json
224.59
135.43
As expected, the benchmark proves that when targeting the HPy/CPython ABI, HPy
doesn't impose any performance penalty on CPython. The universal version is
~10% slower on CPython, but gives an impressive 3x speedup on PyPy! It it
worth noting that the PyPy hpy module is not fully optimized yet, and we
expect to be able to reach the same performance as CPython for this particular
example (or even more, thanks to our better GC).
All in all, not a bad result for two weeks of intense hacking :)
It is also worth noting than PyPy's builtin json module does really
well in this benchmark, thanks to the recent optimizations that were described
in an earlier blog post.
Conclusion and future directions
We think we can be very satisfied about what we have got so far. The
development of HPy is quite new, but these early results seem to indicate that
we are on the right track to bring Python extensions into the future.
At the moment, we can anticipate some of the next steps in the development of
HPy:
Think about a proper API design: what we have done so far has
been a "dumb" translation of the API we needed to run ujson. However,
one of the declared goal of HPy is to improve the design of the API. There
will be a trade-off between the desire of having a clean, fresh new API
and the need to be not too different than the old one, to make porting
easier. Finding the sweet spot will not be easy!
Implement the "debug" mode, which will help developers to find
bugs such as leaking handles or using invalid handles.
Instruct Cython to emit HPy code on request.
Eventually, we will also want to try to port parts of numpy to HPy to
finally solve the long-standing problem of sub-optimal numpy
performance in PyPy.
The PyPy team is proud to release the version 7.2.0 of PyPy, which includes
two different interpreters:
PyPy2.7, which is an interpreter supporting the syntax and the features of
Python 2.7 including the stdlib for CPython 2.7.13
PyPy3.6: which is an interpreter supporting the syntax and the features of
Python 3.6, including the stdlib for CPython 3.6.9.
The interpreters are based on much the same codebase, thus the double
release.
As always, this release is 100% compatible with the previous one and fixed
several issues and bugs raised by the growing community of PyPy users.
We strongly recommend updating. Many of the fixes are the direct result of
end-user bug reports, so please continue reporting issues as they crop up.
With the support of Arm Holdings Ltd. and Crossbar.io, this release supports
the 64-bit aarch64 ARM architecture. More about the work and the
performance data around this welcome development can be found in the blog
post.
This release removes the “beta” tag from PyPy3.6. While there may still be some
small corner-case incompatibilities (around the exact error messages in
exceptions and the handling of faulty codec errorhandlers) we are happy with
the quality of the 3.6 series and are looking forward to working on a Python
3.7 interpreter.
We updated our benchmark runner at https://speed.pypy.org to a more modern
machine and updated the baseline python to CPython 2.7.11. Thanks to Baroque
Software for maintaining the benchmark runner.
The CFFI-based _ssl module was backported to PyPy2.7 and updated to use
cryptography version 2.7. Additionally, the _hashlib, and crypt (or
_crypt on Python3) modules were converted to CFFI. This has two
consequences: end users and packagers can more easily update these libraries
for their platform by executing (cdlib_pypy;../bin/pypy_*_build.py).
More significantly, since PyPy itself links to fewer system shared objects
(DLLs), on platforms with a single runtime namespace like linux, different CFFI
and c-extension modules can load different versions of the same shared object
into PyPy without collision (issue 2617).
Until downstream providers begin to distribute c-extension builds with PyPy, we
have made packages for some common packages available as wheels.
The CFFI backend has been updated to version 1.13.0. We recommend using CFFI
rather than c-extensions to interact with C, and cppyy for interacting with
C++ code.
Thanks to Anvil, we revived the PyPy Sandbox, (soon to be released) which allows total control
over a Python interpreter’s interactions with the external world.
We implemented a new JSON decoder that is much faster, uses less memory, and
uses a JIT-friendly specialized dictionary. More about that in the recent blog post
We would like to thank our donors for the continued support of the PyPy
project. If PyPy is not quite good enough for your needs, we are available for
direct consulting work.
We would also like to thank our contributors and encourage new people to join
the project. PyPy has many layers and we need help with all of them: PyPy
and RPython documentation improvements, tweaking popular modules to run
on PyPy, or general help with making RPython’s JIT even better. Since the
previous release, we have accepted contributions from 27 new contributors,
so thanks for pitching in.
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7, 3.6. It’s fast (PyPy and CPython 2.7.x performance
comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython
can do for them.
This PyPy release supports:
x86 machines on most common operating systems
(Linux 32/64 bit, Mac OS X 64-bit, Windows 32-bit, OpenBSD, FreeBSD)
big- and little-endian variants of PPC64 running Linux
s390x running Linux
64-bit ARM machines running Linux
Unfortunately at the moment of writing our ARM buildbots are out of service,
so for now we are not releasing any binary for the ARM architecture (32-bit), although PyPy does support ARM 32-bit processors.
What else is new?
PyPy 7.1 was released in March, 2019.
There are many incremental improvements to RPython and PyPy, For more information about the 7.2.0 release, see the full changelog.
Please update, and continue to help us make PyPy better.
In the last year or two I have worked on and off on making PyPy's
JSON faster, particularly when parsing large
JSON files. In this post I am going to document those techniques and
measure their performance impact. Note that I am quite a lot more
constrained in what optimizations I can apply here, compared to some of
the much more advanced approaches like
Mison,
Sparser or
SimdJSON because I don't want to
change the json.loads API that Python programs expect, and because I
don't want to only support CPUs with wide SIMD extensions. With a more
expressive API, more optimizations would be possible.
There are a number of problems of working with huge JSON files:
deserialization takes a long time on the one hand, and the resulting
data structures often take a lot of memory (usually they can be many
times bigger than the size of the file they originated from). Of course
these problems are related, because allocating and initializing a big
data structure takes longer than a smaller data structure. Therefore I
always tried to attack both of these problems at the same time.
One common theme of the techniques I am describing is that of optimizing
the parser for how JSON files are typically used, not how they could
theoretically be used. This is a similar approach to the way dynamic
languages are optimized more generally: most JITs will optimize for
typical patterns of usage, at the cost of less common usage patterns,
which might even become slower as a result of the optimizations.
Maps
The first technique I investigated is to use maps in the JSON parser.
Maps, also called hidden classes or shapes, are a fairly common way to
(generally, not just in the context of JSON parsing) optimize instances
of
classes
in dynamic language VMs. Maps exploit the fact that while it is in
theory possible to add arbitrary fields to an instance, in practice most
instances of a class are going to have the same set of fields (or one of
a small number of different sets). Since JSON dictionaries or objects
often come from serialized instances of some kind, this property often
holds in JSON files as well: dictionaries often have the same fields in
the same order, within a JSON file.
This property can be exploited in two ways: on the one hand, it can be
used to again store the deserialized dictionaries in a more memory
efficient way by not using a hashmap in most cases, but instead
splitting the dictionary into a shared description of the set of keys
(the map) and an array of storage with the values. This makes the
deserialized dictionaries smaller if the same set of keys is repeated a
lot. This is completely transparent to the Python programmer, the
dictionary will look completely normal to the Python program but its
internal representation is different.
One downside of using maps is that sometimes files will contain many
dictionaries that have unique key sets. Since maps themselves are quite
large data structures and since dictionaries that use maps contain an
extra level of indirection we want to fall back to using normal hashmaps
to represent the dictionaries where that is the case. To prevent this we
perform some statistics at runtime, how often every map (i.e. set of
keys) is used in the file. For uncommonly used maps, the map is
discarded and the dictionaries that used the map converted into using a
regular hashmap.
Using Maps to Speed up Parsing
Another benefit of using maps to store deserialized dictionaries is that
we can use them to speed up the parsing process itself. To see how this
works, we need to understand maps a bit better. All the maps produced as
a side-effect of parsing JSON form a tree. The tree root is a map that
describes the object without any attributes. From every tree node we
have a number of edges going to other nodes, each edge for a specific
new attribute added:
This map tree is the result of parsing a file that has dictionaries with
the keys a, b, c many times, the keys a, b, f less often, and also some
objects with the keys x, y.
When parsing a dictionary we traverse this tree from the root, according
to the keys that we see in the input file. While doing this, we
potentially add new nodes, if we get key combinations that we have never
seen before. The set of keys of a dictionary parsed so far are
represented by the current tree node, while we can store the values into
an array. We can use the tree of nodes to speed up parsing. A lot of the
nodes only have one child, because after reading the first few keys of
an object, the remaining ones are often uniquely determined in a given
file. If we have only one child map node, we can speculatively parse the
next key by doing a memcmp between the key that the map tree says is
likely to come next and the characters that follow the ',' that started
the next entry in the dictionary. If the memcmp returns true this
means that the speculation paid off, and we can transition to the new map
that the edge points to, and parse the corresponding value. If not, we
fall back to general code that parses the string, handles escaping rules
etc. This trick was explained to me by some V8 engineers, the same trick
is supposedly used as part of the V8 JSON parser.
This scheme doesn't immediately work for map tree nodes that have more
than one child. However, since we keep statistics anyway about how often
each map is used as the map of a parsed dictionary, we can speculate
that the most common map transition is taken more often than the others
in the future, and use that as the speculated next node.
So for the example transition tree shown in the figure above the key
speculation would succeed for objects with keys a, b, c. For objects
with keys a, b, f the speculation would succeed for the first two
keys, but not for the third key f. For objects with the keys
x, y the speculation would fail for the first key x but succeed
for the second key y.
For real-world datasets these transition trees can become a lot more
complicated, for example here is a visualization of a part of the
transition tree generated for parsing a New York Times dataset:
Caching Strings
A rather obvious observation we can use to improve performance of the
parser is the fact that string values repeat a lot in most JSON files.
For strings that are used as dictionary keys this is pretty obvious.
However it happens also for strings that are used as values in
dictionaries (or are stored in lists). We can use this fact to
intern/memoize strings and save memory. This is an approach that many
JSON parsers use, including
CPython's.
To do this, I keep a dictionary of strings that we have seen so far
during parsing and look up new strings that are deserialized. If we have
seen the string before, we can re-use the deserialized previous string.
Right now I only consider utf-8 strings for caching that do not contain
any escapes (whether stuff like \", \n or escaped unicode chars).
This simple approach works extremely well for dictionary keys, but needs
a number of improvements to be a win in general. The first observation
is that computing the hash to look up the string in the dictionary of
strings we've seen so far is basically free. We can compute the hash
while scanning the input for the end of the string we are currently
deserializing. Computing the hash while scanning doesn't increase the
time spent scanning much. This is not a new idea, I am sure many other
parsers do the same thing (but CPython doesn't seem to).
Another improvement follows from the observation that inserting every
single deserialized non-key string into a hashmap is too expensive.
Instead, we insert strings into the cache more conservatively, by
keeping a small ring buffer of hashes of recently deserialized strings.
The hash is looked for in the ring buffer, and only if the hash is
present we insert the string into the memoization hashmap. This has the
effect of only inserting strings into the memoization hashmap that
re-occur a second time not too far into the file. This seems to give a
good trade-off between still re-using a lot of strings but keeping the
time spent updating and the size of the memoization hashmap low.
Another twist is that in a lot of situations caching strings is not
useful at all, because it will almost never succeed. Examples of this
are UUIDs (which are unique), or the content of a tweet in a JSON file
with many tweets (which is usually unique). However, in the same file it
might be useful to cache e.g. the user name of the Twitter user, because
many tweets from the same person could be in such a file. Therefore the
usefulness of the string cache depends on which fields of objects we are
deserializing the value off. Therefore we keep statistics per map field
and disable string memoization per individual field if the cache hit
rate falls below a certain threshold. This gives the best of both
worlds: in the cases where string values repeat a lot in certain fields
we use the cache to save time and memory. But for those fields that
mostly contain unique strings we don't waste time looking up and adding
strings in the memoization table. Strings outside of dictionaries are
quite rare anyway, so we just always try to use the cache for them.
The following pseudocode sketches the code to deserialize a string in
the input at a given position. The function also takes a map, which is
the point in the map tree that we are currently deserializing a field
off (if we are deserializing a string in another context, some kind of
dummy map can be used there).
def deserialize_string(pos, input, map):
# input is the input string, pos is the position of the starting " of
# the string
# find end of string, check whether it contains escape codes,
# compute hash, all at the same time
end, escapes, hash = find_end_of_string(pos + 1, input)
if end == -1:
raise ParseError
if escapes:
# need to be much more careful with escaping
return deserialize_string_escapes(pos, input)
# should we cache at all?
if map.cache_disabled():
return input[pos + 1:end]
# if string is in cache, return it
if hash in cache:
map.cache_hit += 1
return cache[hash]
result = input[pos + 1:end]
map.cache_miss += 1
# if hash is in the ring buffer of recently seen hashes,
# add the string to the cache
if hash in ring_buffer:
cache[hash] = result
else:
ring_buffer.write(hash)
return result
Evaluation
To find out how much the various techniques help, I implemented a number
of JSON parsers in PyPy with different combinations of the techniques
enabled. I compared the numbers with the JSON parser of CPython 3.7.3
(simplejson), with ujson, with the JSON parser of Node 12.11.1 (V8) and with
RapidJSON (in DOM mode).
I collected a number of medium-to-large JSON files to try the JSON
parsers on:
Censys: A subset of the Censys port and
protocol scan data for websites in the Alexa top million domains
Gharchive: Github activity from
January 15-23, 2015 from Github Archive
I measured the times of each benchmark with a number of variations
of the improved PyPy algorithms:
PyPyBaseline: The PyPy JSON parser as it was before my work with JSON
parsing started (PyPy version 5.8)
PyPyKeyStringCaching: Memoizing the key strings of dictionaries, but
not the other strings in a json file, and not using maps to represent
dictionaries (this is the JSON parser that PyPy has been shipping since
version 5.9, in the benchmarks I used 7.1).
PyPyMapNoCache: Like PyPyKeyStringCaching, but using maps to
represent dictionaries. This includes speculatively parsing the next
key using memcmp, but does not use string caching of non-key strings.
PyPyFull: Like PyPyMapNoCache but uses a string cache for all
strings, not just keys. This is equivalent to what will be released soon as part of PyPy 7.2
In addition to wall clock time of parsing, I also measured the increase
in memory use of each implementation after the input string has been
deserialized, i.e. the size of the in-memory representation of every
JSON file.
Contributions of Individual Optimizations
Let's first look at the contributions of the individual optimizations to the
overall performance and memory usage.
All the benchmarks were run 30 times in new processes, all the numbers are
normalized to PyPyFull.
The biggest individual improvement to both parsing time and memory used comes
from caching just the keys in parsed dictionaries. This is the optimization in
PyPy's JSON parser that has been implemented for a while already. To understand
why this optimization is so useful, let's look at some numbers about each
benchmark, namely the number of total keys across all dictionaries in each
file, as well as the number of unique keys. As we can see, for all benchmarks
the number of unique keys is significantly smaller than the number of keys in
total.
Benchmark
Number of keys
Number of unique keys
Censys
14 404 234
163
Gharchive
6 637 881
169
NYTimes
417 337
60
Reddit
25 226 397
21
Rosie
28 500 101
5
TPCH
6 700 000
45
Wikidata
6 235 088
1 602
Yelp
5 133 914
61
The next big jump in deserialization time and memory comes from introducing
maps to represent deserialized dictionaries. With PyPyMapNoCache
deserialization time goes down because it's much cheaper to walk the tree
of maps and store all deserialized objects into an array of values than to
build hashmaps with the same keys again and again. Memory use goes down
for the same reason: it takes a lot less memory to store the shared
structure of each set of keys in the map, as opposed to repeating it again
and again in every hashmap.
We can look at some numbers about every benchmark again. The table shows how
many map-based dictionaries are deserialized for every benchmark, and how many
hashmap-backed dictionaries. We see that the number of hashmap-backed
dictionaries is often zero, or at most a small percentage of all dictionaries
in each benchmark. Yelp has the biggest number of hashmap-backed dictionaries.
The reason for this is that the input file contains hashmaps that store
combinations of various features of Yelp businesses, and a lot of these
combinations are totally unique to a business. Therefore the heuristics
determine that it's better to store these using hashmaps.
Benchmark
Map Dicts
Regular Dicts
% Regular Dicts
Censys
4 049 235
1 042
0.03
Gharchive
955 301
0
0.00
NYTimes
80 393
0
0.00
Reddit
1 201 257
0
0.00
Rosie
6 248 966
0
0.00
TPCH
1 000 000
0
0.00
Wikidata
1 923 460
46 905
2.38
Yelp
443 140
52 051
10.51
We can also look at numbers about how often the memcmp-based speculative
parsing of the next key of a given map succeeds. Looking at statistics
about each benchmark, we can see that the speculation of what key we
expect next pays off in a significant percentage of cases, between 63% for
Wikidata where the dictionary structures are quite irregular, and 99% for
Reddit, where all the dictionaries have the same set of keys.
Benchmark
Number of Keys
Map Transitions
% Successful Speculation
Censys
14 404 234
14 403 243
65.79
Gharchive
6 637 881
6 637 881
86.71
NYTimes
417 337
417 337
79.85
Reddit
25 226 397
25 226 397
100.00
Rosie
28 500 101
28 500 101
90.37
TPCH
6 700 000
6 700 000
86.57
Wikidata
6 235 088
5 267 744
63.68
Yelp
5 133 914
4 593 980
90.43
geomean
82.04
General string caching is the most unclear optimization. On the one hand its
impact on memory usage is quite substantial, leading to a 20% reduction for
Gharchive and Reddit, up to a 2× improvement for Yelp. On the other hand, the
effect on performance is less clear, since it even leads to a slowdown in
Gharchive and Reddit, and generally only a small improvement. Choosing the
right heuristic for when to disable the cache also has somewhat unclear effects
and is definitely a topic worthy of further investigation.
Comparison against other JSON Decoders
To get a more general feeling of the performance and memory usage of the
improved PyPy parser, we compare it against CPython's built-in json
parser, ujson for CPython, Node's (V8) JSON parser and RapidJSON. For
better context for the memory usage I also show the file size of the input
files.
These benchmarks are not really an apples-to-apple comparison. All of the
implementations use different in-memory representations of strings in
the deserialized data-structure (Node uses two bytes per character in
a string, in CPython it
depends but 4 bytes on my
machine), PyPyBaseline uses four bytes, PyPy and RapidJSON use utf-8). But
it's still interesting to get some ballpark numbers. The results are as
follows:
As we can see, PyPyFull handily beats CPython and ujson, with a geometric
mean of the improvement of about 2.5×. The memory improvement can be even
more extreme, with an improvement of over 4× against CPython/ujson in some
cases (CPython gives better memory sizes, because its parser caches the
keys of dictionaries as well). Node is often more than 50% slower, whereas
RapidJSON beats us easily, by a factor of 2× on average.
Conclusions
While the speedup I managed to achieve over the course of this project is
nice and I am certainly happy to beat both CPython and Node, I am
ultimately still annoyed that RapidJSON manages to maintain such a clear
lead over PyPyFull, and would like to get closer to it. One problem that
PyPy suffers compared to RapidJSON is the overhead of garbage collection.
Deserializing large JSON files is pretty much the worst case for the
generational GC that PyPy uses, since none of the deserialized objects die
young (and the GC expects that most objects do). That means that a lot of
the deserialization time of PyPy is wasted allocating the resulting
objects in the nursery, and then copying them into the old generation.
Somehow, this should be done in better ways, but all my attempts to not
have to do the copy did not seem to help much. So maybe more improvements
are possible, if I can come up with more ideas.
On the memory side of things, Node/V8 is beating PyPy clearly which might
indicate more general problems in how we represent Python objects in
memory. On the other hand, I think it's cool that we are competitive with
RapidJSON in terms of memory and often within 2× of the file size.
An effect that I didn't consider at all in this blog post is the fact that
accessing the deserialized objects with constants strings is also faster
than with regular dictionaries, due to them being represented with maps.
More benchmarking work to do in the future!
If you have your own programs that run on PyPy and use the json parser
a lot, please measure them on the new code and let me know whether you see
any difference!