Monday, March 1, 2010

Benchmarking twisted

Hello.

I recently did some benchmarking of twisted on top of PyPy. For the very impatient: PyPy is up to 285% faster than CPython. For more patient people, there is a full explanation of what I did and how I performed measurments, so they can judge themselves.

The benchmarks are living in twisted-benchmarks and were mostly written by Jean Paul Calderone. Even though he called them "initial exploratory investigation into a potential direction for future development resulting in performance oriented metrics guiding the process of optimization and avoidance of complexity regressions", they're still much much better than average benchmarks found out there.

The methodology was to run each benchmark for quite some time (about 1 minute), measuring number of requests each 5s. Then I looked at dump of data and substracted some time it took for JIT-capable interpreters to warm up (up to 15s), averaging everything after that. Averages of requests per second are in the table below (the higher the better):

benchname CPython Unladen swallow PyPy
names 10930 11940 (9% faster) 15429 (40% faster)
pb 1705 2280 (34% faster) 3029 (78% faster)
iterations 75569 94554 (25% faster) 291066 (285% faster)
accept 2176 2166 (same speed) 2290 (5% faster)
web 879 854 (3% slower) 1040 (18% faster)
tcp 105M 119M (7% faster) 60M (46% slower)

To reproduce, run each benchmark with:

benchname.py -n 12 -d 5

WARNING: running tcp-based benchmarks that open new connection for each request (web & accept) can exhaust number of some kernel structures, limit n or wait until next run if you see drops in request per second.

The first obvious thing is that various benchmarks are more or less amenable to speedups by JIT compilation. Accept and tcp getting smallest speedups, if at all. This is understandable, since JIT is mostly about reducing interpretation and frame overhead, which is probably not large when it comes to accepting connections. However, if you actually loop around, doing something, JIT can give you a lot of speedup.

The other obvious thing is that PyPy is the fastest python interpreter here, almost across-the board (Jython and IronPython won't run twisted), except for raw tcp throughput. However, speedups can vary and I expect this to improve after the release, as there are points, where PyPy can be improved. Regarding raw tcp throughput - this can be a problem for some applications and we're looking forward to improve this particular bit.

The main reason to use twisted for this comparison is a lot of support from twisted team and JP Calderone in particular, especially when it comes to providing benchmarks. If some open source project wants to be looked at by PyPy team, please provide a reasonable set of benchmarks and infrastructure.

If, however, you're a closed source project fighting with performance problems of Python, we're providing contracting for investigating opportunities, how PyPy and not only PyPy, can speed up your project.

Cheers,
fijal


Benchmark descriptions:

  • names - simple DNS server
  • web - simple http hello world server
  • pb - perspective broker, RPC mechanism for twisted
  • iterations - empty twisted loop
  • accept - number of tcp connections accepted per second
  • tcp - raw socket transfer throughput

Used interpreters:

  • CPython 2.6.2 - as packaged by ubuntu
  • Unladen swallow svn trunk, revision 1109
  • PyPy svn trunk, revision 71439

Twisted version used: svn trunk, revision 28580

Machine: unfortunately 32bit virtual-machine under qemu, running ubuntu karmic, on top of Quad core intel Q9550 with 6M cache. Courtesy of Michael Schneider.

12 comments:

piranha said...

Would be nice to see at least rough approximation of amount of RAM used by each implementation. :-)

Anonymous said...

Great as always.

I'm looking forward to use PyPy in production with the next stable release in march. =)

Yuri Baburov said...

Is it possible to run the same tests with CPython+Psyco?
That would be really interesting to see!

Tim Parkin said...

Congrats... things continue to look interesting :-)

Maciej Fijalkowski said...

@Yuri

No, psyco has limitations on frames that break zope.interface which twisted depends on.

Doc Button said...

I agree with Yuri, it would be of interest to record memory stats for each benchmark run.

KoObz said...

Awesome results Maciej.

Question: what's it gonna take for pypy to supplant Cpython?

You're faster and I'm guessing you have nowhere near the manpower of Cpython. Plus, you're written in Python so future work will be much easier. Seems like a no brainer to embrace pypy.

Luis said...

Question: After having read many comments and posts from pypy's developers lately, I got the impression (I might be wrong though), that you are betting all on tracing for getting speedups, (that the slow interpreter will eventually be compensated by the magic of tracing).
However, other projects that rely on tracing seem to favor a dual approach, which is a traditional method-a-time jit (which can evenly speed up all kinds of code) plus tracing for getting the most of highly numerical code (luajit 2.0, mozila's jaegermonkey, for example).

Is this accurate or I'm wrong? Do you think that the current tracing strategy will eventually get speedups for those benchamarks that are currently on par or way bellow cpython? Or will you have to add a more traditional approach for the baseline?

Maciej Fijalkowski said...

Hey Luis.

That's a very interesting question. I will try answer couple of your points, but feel free to move to pypy-dev mailing list if you want to continue discussion.

We indeed bet on tracing (or jitting in general) to compensate for slower interpretation than CPython. However, our tracing is far more general than spidermonkeys - for example we can trace a whole function from start and not require an actual loop. We hope to generalize tracing so it can eventually trace all constructs.

The main difference between ahead-of-time and tracing is that tracing requires actual run, while ahead-of-time tries to predict what will happen. Results are generally in favor of tracing, although the variation will be larger (tracing does statistically correct branch prediction, not necesarilly always the correct one).

Regarding benchmarks, most of those benchmarks that we're slower than CPython showcase that our tracing is slow (they don't contain warmup). And again, for some of those we'll just include warmup (like twisted.web which is web server, makes sense in my opinion), for other we'll try to make tracing faster. And again, the speed of tracing is not the property of tracing, but rather pypy's limitation right now.

Some other benchmarks are slow because we don't JIT regular expressions (spambayes). This should be fixed, but it's again unrelated to tracing.

To summarize: I don't expect us trying dual approach (one jit is enough fun, believe me), but instead generalizing tracing and making it more efficient. How this will go, we'll see, I hope pretty well.

Cheers,
fijal

Antonio Cuni said...

@Luis

other than Maciek's points, which I subscribe, it should be said
that, since each language has a different semantics, the
efficiency of a traditional "method-at-a-time" JIT can vary
dramatically. In particular, the dynamism of Python is so deep
that a traditional JIT cannot win much: Jython and IronPython do
exactly that, but for most use cases are slower than CPython. If
you are interested, Chapter 2 of my PhD thesis explores these
topics :-)
http://codespeak.net/svn/user/antocuni/phd/thesis/thesis.pdf

Anonymous said...

great results!
As for the warm-up, would it be possible to save some of the tracing decisions in some file (.pyt?) to help on next startup?
-shai

Maciej Fijalkowski said...

@Anonymous

Saving the results is hard, but not impossible. There are other possibilities (like keeping process around) though.

Cheers,
fijal