As you know, a lot of PyPy's recent development effort has gone into speeding up
execution of Python programs. However, an additional good property of PyPy's
Python interpreter is that most objects are represented in a much more compact
way than in CPython. We would like to investigate some more advanced techniques
to reduce the memory usage of Python programs further.
To do this it is necessary to investigate the memory behaviour of real programs
with large heaps. For speed measurements there are standard benchmarks, but for
memory improvements there is nothing comparable, the memory behaviour of large
programs is not that well understood. Therefore we are looking for programs that we
can study and use as benchmarks.
Specifically we are looking for Python programs with the following properties:
- large heaps of about 10MB-1GB
- should have non-trivial runtime as well (in the range of a few seconds), to
judge the speed impact of optimizations
- ideally pure-Python programs that don't use extension modules so that they run
under both CPython and PyPy (this is optional, but makes my life much easier).
We are also rather interested in programs that do a lot of string/unicode
processing.
We would be grateful for all ideas. Telling us about a program also has the
advantage that we will work on optimizing PyPy for it :-).