PyPy Status Blog: Sprint Discussions: Releases, Testing

Sunday, November 25, 2007

Sprint Discussions: Releases, Testing

During the sprint we had various discussions about technical issues as well as planning discussions about how we want to go about things. One of them was about the stability of PyPy, how to ensure stability, how to handle releases and approaches to being more "usable". I will describe this discussion in this post (there are also minutes of the meeting).

The Meetings whiteboard

Testing

First we discussed the current situation in terms of testing. PyPy has been extremely testing-oriented from the start, it is being developed almost exclusively in test-driven-development style. To deal with the large number of tests we already have some infrastructure in place:

we run all of PyPy's tests nightly on a Linux machine

we translate a PyPy Python interpreter every night and use that to run the CPython compliance tests against it, also on a Linux machine

we translate several Python interpreters every night and run benchmarks against them on a PowerPC running Mac OS X

As you can see, we are lacking in the Windows testing area, which is an even worse problem because none of the currently active developers has Windows as his primary OS. We should improve this by finding a Windows machine where the tests are run nightly and where we can log in to try bug-fixes quickly. The latter bit is important, we had a nightly windows test run before (thanks to Scott Dial) but it didn't help, because even if you tried to fix a bug you would have to wait until the next night to see whether it worked.

Another very serious problem is that of aggregation: we have these various test runs that all have a web interface to check for errors but there is no easy way to find out which tests failed. You have to go to each page and even some sub-pages to see what needs fixing, which is a tedious process. The idea for solving this is aggregate all the available information into some sort of testing-entry-point page that gives a quick overview of the regressions that happened during the night. It's not clear whether we can achieve that with existing tools (buildbots or whatever), but we will investigate that.

Releases

The discussion about releases was more on a fundamental and less on a concrete level (especially when it comes to time-frames). We discussed what it means to make a release, because obviously it is more than just taking an SVN revision and putting a tarball of it onto the webpage. During the EU period we were required to make several releases, but those were not really meant to be more than technology previews for the brave adventurers to try. In the future we have the goal to release things that are more stable and hopefully more practically useful. The plan is to use medium-sized Python applications that have a chance to run on top of PyPy because they don't use too many extension modules (web apps being likely candidates) and that have good unit-tests themselves. The first step would be to find some applications that fit this description, fix the bugs that prevents PyPy from running them and from then on run them nightly on one of the testing machines to check for regressions. This would allow us to be more confident when stating that "PyPy works".

Another thing to keep in mind for releases is the special features that our Python interpreter provides (e.g. the thunk and the taint object space, our stackless features, transparent proxies, sandboxing, special object implementations). Those features are neither tested by the CPython tests nor by any existing applications. Therefore we cannot really be confident that these features work and don't have too many bugs (in fact, the first time somebody really use the become feature of the thunk space in earnest he found a serious bug that is not fixed so far). To get around this problem, we plan to write small-to-medium sized example applications for each of these features (for stackless we can maybe use one of the existing stackless examples). This will hopefully find bugs and will also make it possible to evaluate whether the features make sense from a language design point of view.

A minor thing to make releases easier is to be able to not only have the tests be run once a night but also be able to trigger them manually on the release branch before doing the release.

Publishing Cool Things

Since we decided that the releases we make should be stable and usable, we also discussed how we would go about making new "cool things" like features, experiments etc. better known. The consensus was that this blog is probably the best forum for doing this. In addition we discussed having a stabler snapshot of the trunk made to ensure that people wanting to play around with these features don't accidentally get a broken version.

Helping Out

Right now we are still in cleanup mode (the cleanup sprint is nearly done, but we haven't finished all the cleanups yet), so we won't be able to start on the above things right now. However, they will have a strong focus soon. So if you are interested in trying out to run programs on top of PyPy or writing new ones that use the new features you are most welcome to do so and we will try to fix the bugs or help you doing it (of course some tolerance against frustration is needed when you do that, because the bugs that turn up tend to be obscure). We have not been perfect at this in the past, but this will have to change.