Friday, October 16, 2015

PowerPC backend for the JIT

Hi all,

PyPy's JIT now supports the 64-bit PowerPC architecture! This is the third architecture supported, in addition to x86 (32 and 64) and ARM (32-bit only). More precisely, we support Linux running the big- and the little-endian variants of ppc64. Thanks to IBM for funding this work!

The new JIT backend has been merged into "default". You should be able to translate PPC versions as usual directly on the machines. For the foreseeable future, I will compile and distribute binary versions corresponding to the official releases (for Fedora), but of course I'd welcome it if someone else could step in and do it. Also, it is unclear yet if we will run a buildbot.

To check that the result performs well, I logged in a ppc64le machine and ran the usual benchmark suite of PyPy (minus sqlitesynth: sqlite was not installed on that machine). I ran it twice at a difference of 12 hours, as an attempt to reduce risks caused by other users suddenly using the machine. The machine was overall relatively quiet. Of course, this is scientifically not good enough; it is what I could come up with given the limited resources.

Here are the results, where the numbers are speed-up factors between the non-jit and the jit version of PyPy. The first column is x86-64, for reference. The second and third columns are the two ppc64le runs. All are Linux. A few benchmarks are not reported here because the runner doesn't execute them on non-jit (however, apart from sqlitesynth, they all worked).

    ai                        13.7342        16.1659     14.9091
    bm_chameleon               8.5944         8.5858        8.66
    bm_dulwich_log             5.1256         5.4368      5.5928
    bm_krakatau                5.5201         2.3915      2.3452
    bm_mako                    8.4802         6.8937      6.9335
    bm_mdp                     2.0315         1.7162      1.9131
    chaos                     56.9705        57.2608     56.2374
    crypto_pyaes               62.505         80.149     79.7801
    deltablue                  3.3403         5.1199      4.7872
    django                    28.9829         23.206       23.47
    eparse                     2.3164         2.6281       2.589
    fannkuch                   9.1242        15.1768     11.3906
    float                     13.8145        17.2582     17.2451
    genshi_text               16.4608        13.9398     13.7998
    genshi_xml                 8.2782         8.0879      9.2315
    go                         6.7458        11.8226     15.4183
    hexiom2                   24.3612        34.7991     33.4734
    html5lib                   5.4515         5.5186       5.365
    json_bench                28.8774        29.5022     28.8897
    meteor-contest             5.1518         5.6567      5.7514
    nbody_modified            20.6138        22.5466     21.3992
    pidigits                   1.0118          1.022      1.0829
    pyflate-fast               9.0684        10.0168     10.3119
    pypy_interp                3.3977         3.9307      3.8798
    raytrace-simple           69.0114       108.8875    127.1518
    richards                  94.1863       118.1257    102.1906
    rietveld                   3.2421         3.0126      3.1592
    slowspitfire               2.8539         3.3924      3.5541
    spambayes                  5.0646         6.3446       6.237
    spectral-norm             41.9148        42.1831     43.2913
    spitfire                   3.8788         4.8214       4.701
    spitfire_cstringio          7.606         9.1809      9.1691
    sympy_expand               2.9537         2.0705      1.9299
    sympy_integrate            4.3805         4.3467      4.7052
    sympy_str                  1.5431         1.6248      1.5825
    sympy_sum                  6.2519          6.096      5.6643
    telco                     61.2416        54.7187     55.1705
    twisted_iteration         55.5019        51.5127     63.0592
    twisted_names              8.2262         9.0062      10.306
    twisted_pb                12.1134         13.644     12.1177
    twisted_tcp                4.9778          1.934      5.4931

    GEOMETRIC MEAN               9.31           9.70       10.01

The last line reports the geometric mean of each column. We see that the goal was reached: PyPy's JIT actually improves performance by a factor of around 9.7 to 10 times on ppc64le. By comparison, it "only" improves performance by a factor 9.3 on Intel x86-64. I don't know why, but I'd guess it mostly means that a non-jitted PyPy performs slightly better on Intel than it does on PowerPC.

Why is that? Actually, if we do the same comparison with an ARM column too, we also get higher numbers there than on Intel. When we discovered that a few years ago, we guessed that on ARM running the whole interpreter in PyPy takes up a lot of resources, e.g. of instruction cache, which the JIT's assembler doesn't need any more after the process is warmed up. And caches are much bigger on Intel. However, PowerPC is much closer to Intel, so this argument doesn't work for PowerPC. But there are other more subtle variants of it. Notably, Intel is doing crazy things about branch prediction, which likely helps a big interpreter---both the non-JITted PyPy and CPython, and both for the interpreter's main loop itself and for the numerous indirect branches that depend on the types of the objects. Maybe the PowerPC is as good as Intel, and so this argument doesn't work either. Another one would be: on PowerPC I did notice that gcc itself is not perfect at optimization. During development of this backend, I often looked at assembler produced by gcc, and there are a number of small inefficiencies there. All these are factors that slow down the non-JITted version of PyPy, but don't influence the speed of the assembler produced just-in-time.

Anyway, this is just guessing. The fact remains that PyPy can now be used on PowerPC machines. Have fun!

A bientôt,


No comments: