Wednesday, June 17, 2015

PyPy and ijson - a guest blog post

This gem was posted in the ijson issue tracker after some discussion on #pypy, and Dav1dde kindly allowed us to repost it here:

"So, I was playing around with parsing huge JSON files (19GiB, testfile is ~520MiB) and wanted to try a sample code with PyPy, turns out, PyPy needed ~1:30-2:00 whereas CPython 2.7 needed ~13 seconds (the pure python implementation on both pythons was equivalent at ~8 minutes).

"Apparantly ctypes is really bad performance-wise, especially on PyPy. So I made a quick CFFI mockup: https://gist.github.com/Dav1dde/c509d472085f9374fc1d

Before:

CPython 2.7:
    python -m emfas.server size dumps/echoprint-dump-1.json
    11.89s user 0.36s system 98% cpu 12.390 total 

PYPY:
    python -m emfas.server size dumps/echoprint-dump-1.json
    117.19s user 2.36s system 99% cpu 1:59.95 total


After (CFFI):

CPython 2.7:
     python jsonsize.py ../dumps/echoprint-dump-1.json
     8.63s user 0.28s system 99% cpu 8.945 total 

PyPy:
     python jsonsize.py ../dumps/echoprint-dump-1.json
     4.04s user 0.34s system 99% cpu 4.392 total

"



Dav1dd goes into more detail in the issue itself, but we just want to emphasize a few significant points from this brief interchange:
  • His CFFI implementation is faster than the ctypes one even on CPython 2.7.
  • PyPy + CFFI is faster than CPython even when using C code to do the heavy parsing.
 The PyPy Team

4 comments:

Alendit said...

Maybe it's time to discuss inclusion of CFFI into stdandard library again?

Armin Rigo said...

If CPython decides to include it in its stdlib, I can make sure it is updated as needed. I don't have the energy to discuss its inclusion myself, so if it happens it will be "championed" by someone else. Nowadays, I personally think inclusion has as many drawbacks as advantages, even if CFFI 1.x shouldn't evolve a lot in the foreseeable future after the 1.0 step.

v3ss said...

The problem is converting existing libs to use cffi. Only very few percent of Libs are ready for python3.x and with this trend , not even 1% of libs will be converted to work with CFFI.
That makes PyPy adoption a lot slower.

Is there really no chance of improving ctypes?

Maciej Fijalkowski said...

you would think, but these days vast majority of popular C bindings come with cffi equivalents. In fact cffi is vastly more popular than ctypes ever was.