Tuesday, August 17, 2010

Call for Benchmarks

As you know, a lot of PyPy's recent development effort has gone into speeding up execution of Python programs. However, an additional good property of PyPy's Python interpreter is that most objects are represented in a much more compact way than in CPython. We would like to investigate some more advanced techniques to reduce the memory usage of Python programs further.

To do this it is necessary to investigate the memory behaviour of real programs with large heaps. For speed measurements there are standard benchmarks, but for memory improvements there is nothing comparable, the memory behaviour of large programs is not that well understood. Therefore we are looking for programs that we can study and use as benchmarks.

Specifically we are looking for Python programs with the following properties:

  • large heaps of about 10MB-1GB
  • should have non-trivial runtime as well (in the range of a few seconds), to judge the speed impact of optimizations
  • ideally pure-Python programs that don't use extension modules so that they run under both CPython and PyPy (this is optional, but makes my life much easier).

We are also rather interested in programs that do a lot of string/unicode processing.

We would be grateful for all ideas. Telling us about a program also has the advantage that we will work on optimizing PyPy for it :-).

26 comments:

lasi said...

I'm not think very much about it. But Zodb, durus or dobbin could be useful.

Zeev said...

portage, the official Gentoo Linux package manager, does package dependency resolution and can take a few seconds for large updates. It parses package metadata from text files.

Peter Goodman said...

You could run a program that determinizes a large NFA. Given an existing Python program that can determinize an NFA, you could give it an expanded version of the NFA on page 15 here: http://www.springerlink.com/content/cq16j1uv511g793g/fulltext.pdf. Another way is to take some complex NFAs, concatenate them, and determinize.

Anonymous said...

Bazaar and mercurial take a lot of memory (time as well) when updating/merging etc. large repositories, especially if they contain large files.

Anonymous said...

Pylint (http://www.logilab.org/project/pylint) could be a nice target. Pure Python, the size of the heap and run time depend on what kind of code you throw at it.

VanL said...

You could try loading and manipulating a large graph with NetworkX. Pure Python, and the size and runtime could be tuned by varying the size of the graph and the algorithms that are run.

Unknown said...

Whoosh comes to mind. People will always be grateful if you speed up search for them :)

Anonymous said...

The CDPedia creates and manipulates its index with a pure-python inverted index implementation.

It could be extracted and made into a benchmark - there are other pure-python inverted indices around, those could also work.

They do tend to use lots and lots of memory, the CDPedia's implementation uses the builtin array module for byte sequence manipulation and bare strings as data store (it's highly optimized for lowering CPython's memory usage), but there are a few dict-heavy places yet.

Anonymous said...

Agreed that Bazaar and Mercurial would be interesting use cases, especially for projects with large revision history graphs.

Memory usage analysis has come up recently on the bzr list:
https://lists.ubuntu.com/archives/bazaar/2010q3/069549.html

Carl Friedrich Bolz-Tereick said...

All great ideas, thanks a lot!

Anonymous said...

Python Natural Language Toolkit
http://www.nltk.org/

Give a huge corpus (Wikipedia?) and do any operation on it -- nltk will take huge loads of memory in all kinds of custom objects, lists and tuples.

Pingveno said...
This comment has been removed by the author.
Pingveno said...

From what I understand, PyExcelerator, a writer/reader for Excel files, takes huge amounts of memory for very large files. It uses pure Python objects for each cell, which kills memory use when you're writing many millions of cells.

Dan Stromberg said...

A couple of possibilities from my own OSS code:

http://stromberg.dnsalias.org/~strombrg/treap

http://stromberg.dnsalias.org/~strombrg/pyindex.html



I'd most likely be happy to relicense the treap code as needed to facilitate inclusion. The pyindex code is under a UCI (I believe it's BSDish) license, and would probably need to remain so.

Anonymous said...

I really didn't think about it much, I'm just trying to chew through my RSS backlog, and ran into a post about pkgcore dealing with memory issues just a few minutes after I read this call for benchmarks.

Maybe you could use that.

Anonymous said...

You might want to lok at MiniLight:

http://www.hxa.name/minilight/#comparison

none said...

I'm the author of a scientific application that can be suited to your needs. It runs both with Python 2.x and PyPy, so I bundled a distribution with some example benchmarks if this interests you: http://dl.dropbox.com/u/7931953/pypy-bench.tar.bz2 (see bench.README)

An interesting observation in my opinion is that on small runs, CPython outperforms PyPy but this progressively reverses on longer runs.

Carl Friedrich Bolz-Tereick said...

@all: thanks for the proposals, I am looking at them.

@Franck: This is probably due to the JIT, which needs some time to compile at the beginning. Later, the assembler exists and executes quickly. Will look at your code, thanks for providing it.

Anonymous said...

Hello, i am the author of an chess program being written entirely in python. I haven't published it jet, because i am a bit ashame of its poor quality. However it should suffice for the sole purpose of benchmarking. Please drop me a note if you are interested. My email adress is: larudwer at freenet dot de

Some Notes:
The Program is just console mode (UCI), no gui.

it eats up all the memory you have

cpython is almost twice as fast as pypy-1.3 on this program and psyco accelerates it by another factor of two.

Unknown said...

You could consider Tahoe-LAFS. A good reason to use it is that it is a practicality-oriented, widely deployed tool with significant memory usage that we routinely spend engineering effort to track and manage.

Here are some graphs of the memory usage of different versions of Tahoe-LAFS over time:

32-bit machine:
http://tahoe-lafs.org/tahoe-figleaf-graph/hanford.allmydata.com-tahoe_memstats.html

64-bit machine:
http://tahoe-lafs.org/tahoe-figleaf-graph/hanford.allmydata.com-tahoe_memstats_64.html

Here are some open tickets about memory usage in our issue tracker:

http://tahoe-lafs.org/trac/tahoe-lafs/query?status=!closed&keywords=~memory&order=priority

The reason not to use Tahoe-LAFS as a subject is that it uses several native-code libraries to for the CPU-intensive inner loops (cryptography, erasure coding). I really want those libraries, and hence Tahoe-LAFS, to be usable with cpyext as soon as possible, but I haven't tried and I assume that cpyext isn't 100% there yet.

By the way the easiest way to measure the performance of Tahoe-LAFS would be to run its unit tests and measure the memory usage and runtime. This is not only the easiest way, but it is also a pressing issue for us! Tahoe-LAFS unit tests take too long to run, and this causes problems for us, and we very much like it if they could run to completion faster.

http://tahoe-lafs.org/trac/tahoe-lafs/ticket/20# unit tests take too long

Here are our buildbots showing unit test runtime among other things:

http://tahoe-lafs.org/buildbot/waterfall?show_events=true&builder=Kyle+OpenBSD-4.6+amd64&builder=hardy-amd64&builder=Arthur+lenny+c7+32bit&builder=Eugen+lenny-amd64&builder=David+A.+OpenSolaris+i386&builder=Ruben+Fedora&builder=Zooko+zomp+Mac-amd64+10.6+py2.6&builder=FreeStorm+WinXP-x86+py2.6&builder=tarballs

Adam Sampson said...

rawdog (disclosure of bias: I wrote it) sounds like it might be of use. It's an RSS aggregator that generates static HTML. Pure Python 2, with lots of string processing, mostly in the feedparser module. Memory usage and runtime depends on how many feeds it's reading and how much history it keeps, since it does everything in memory at the moment, using pickle for persistant state. (With my 800-odd feeds and two-month history, writing the entire store to HTML will use a few hundred meg of memory and run for several minutes.)

A future redesign will use a more sensible database-backed approach...

Bob Ziuchkovski said...

Scapy would be a great one to benchmark. Depending on the size of the packet capture, it can consume quite a bit of proc/mem when loading and dissecting large captures. I run it at work on Cpython and would love to see it running/optimized under pypy. The only problem is that I believe it uses some 2.6 pythonisms.

Carl Friedrich Bolz-Tereick said...

Thanks again for all the additional pointers. Still investigating all of them.

Anonymous said...

How about Nucular, a search engine written in python by aaron watters.

http://nucular.sourceforge.net/

Anonymous said...

In my view, the natural competitors to PyPy (in the domain of fast interpreters for dynamic languages) are Tracemonkey and V8. Therefore, translations of the Sunspider, V8, and Dromaeo benchmarks would be appropriate.

Anonymous said...

bitbake looks like a good candidate. It's a derivative of portage, and used to crosscompile linux distro for embedded device.

With non trivial distro, it can use up to 400Mb. It already use psyco if available, and can be interesting compare speed/memory usage with pypy.