Friday, January 25, 2008

Buildbots and Better Platform Support

In the last days we improved platform-support of PyPy's Python interpreter. Jean-Paul Calderone has been tirelessly working for some time now on setting up a buildbot for translating and testing PyPy. So far the basic mechanisms are working and the buildbot is running on various machines, including some that Michael Schneider (bigdog) lets us use, one of them being a Windows machine, the other one with a 64bit Linux (lots of thanks to those two, you are awesome!).

What is still missing is a nice way to visualize the test results to quickly see which tests have started failing on which platforms. There is a prototype already, which still needs some tweaking.

The availability of these machines has triggered some much-needed bug-fixing in PyPy to make our Python interpreter work better on Windows and on 64 bit Linux. Maciek and Michael Schneider worked on this quite a bit last week, with the result that PyPy supports many more extension modules now on Windows and 64 bit Linux. Since we now have the buildbot the hope is that the support also won't disappear soon :-).

Thursday, January 24, 2008

PyPy Keyboard Heatmap

Today I saw the keyboard heatmap generator on the Blended Technologies blog. I threw all the PyPy code at it to see whether the heatmap looks any different than normal Python code. It doesn't:

So now the excuse "I can't contribute to PyPy because it needs all those special PyPy-keys" isn't working anymore :-).

Monday, January 21, 2008

RPython can be faster than C

(yes, C as in language, not c as in speed of light). I looked recently at the great computer language shootout, for some benchmarks and to make some speed comparisons. I use this benchmark, modified it to be rpythonic-enough and compared speeds. The code is here (the only change from the Python version was to create a class instead of tuple, so actually this version is more OO). Also the benchmark is very likely flawed because it favours better GCs :).
So, here we go:
Language:Time of run (for N=14):
Python version running on Python 2.5.1, distribution25.5s
Python version running on PyPy with generational GC45.5
Python with psyco20s
RPython translated to C using PyPy's generational GC0.42s
compiling the Haskell version with GHC
compiling the C version with gcc 4.1.2 -O3 -fomit-frame-pointer0.6s

Also worth noticing is that when using psyco with the original version (with tuples) it is very fast (2s).

So, PyPy's Python interpreter is 80% slower than CPython on this (not too horrible), but RPython is 40% faster than gcc here. Cool. The result is mostly due to our GC, which also proves that manual memory-management can be slower than garbage collection in some situations. Please note that this result does not mean that RPython is meant for you. It requires a completely different mindset than the one used to program in Python. Don't say you weren't warned! :-)

Saturday, January 19, 2008

PyPy.NET goes Windows Forms

After having spent the last few days on understanding PyPy's JIT, today I went back hacking the clr module. As a result, it is now possible to import and use external assemblies from pypy-cli, including Windows Forms

Here is a screenshot of the result you get by typing the following at the pypy-cli interactive prompt:

>>>> import clr
>>>> clr.AddReferenceByPartialName("System.Windows.Forms")
>>>> clr.AddReferenceByPartialName("System.Drawing")
>>>> from System.Windows.Forms import Application, Form, Label
>>>> from System.Drawing import Point
>>>> frm = Form()
>>>> frm.Text = "The first pypy-cli Windows Forms app ever"
>>>> lbl = Label()
>>>> lbl.Text = "Hello World!"
>>>> lbl.AutoSize = True
>>>> lbl.Location = Point(100, 100)
>>>> frm.Controls.Add(lbl)
>>>> Application.Run(frm)

Unfortunately at the moment you can't do much more than this, because we still miss support for delegates and so it's not possibile to handle events. Still, it's a step in the right direction :-).

Improve .NET Integration

A while ago Amit Regmi, a student from Canada, started working on the clr module improvements branch as a university project.

During the sprint Carl Friedrich, Paul and me worked more on it and brought it to a mergeable state.

It adds a lot of new features to the clr module, which is the module that allows integration between pypy-cli (aka PyPy.NET) and the surrounding .NET environment:

  • full support to generic classes;
  • a new importer hook, allowing things like from System import Math and so on;
  • .NET classes that implements IEnumerator are treated as Python iterators; e.g. it's is possile to iterate over them with a for loop.

This is an example of a pypy-cli session:

>>>> from System import Math
>>>> Math.Abs(-42)
>>>> from System.Collections.Generic import List
>>>> mylist = List[int]()
>>>> mylist.Add(42)
>>>> mylist.Add(43)
>>>> mylist.Add("foo")
Traceback (most recent call last):
  File "<console>", line 1, in <interactive>
TypeError: No overloads for Add could match
>>>> mylist[0]
>>>> for item in mylist: print item

This is still to be considered an alpha version; there are few known bugs and probably a lot of unknown ones :-), so don't expect it to work in every occasion. Still, it's a considerable step towards real world :-).

Friday, January 18, 2008

Crashing Other People's Compilers

Over the years PyPy has (ab?)used various external software for different purposes, and we've discovered bugs in nearly all of them, mostly by pushing them to their limits. For example, many compilers are not happy with 200MB of source in one file. The Microsoft C compiler has a limit of 65536 lines of code per file and the CLI was raising "System.InvalidProgramException: Method pypy.runtime.Constants:.cctor () is too complex.", where too complex probably means "too long". Just for fun, today we collected all projects we could think of in which we found bugs:

So one could say that PyPy is really just the most expensive debugging tool ever :-).

Monday, January 14, 2008

Leysin Winter Sport Sprint Started

The Leysin sprint has started since yesterday morning in the usual location. The view is spectacular (see photo) the weather mostly sunny. The following people are sprinting:

  • Maciej Fijalkowski
  • Armin Rigo
  • Toby Watson
  • Paul deGrandis
  • Antonio Cuni
  • Carl Friedrich Bolz
So it is a rather small sprint.

We started working on various features and performance improvements for the high level backends (JVM and .NET) and on implementing ctypes for PyPy. Later this week we plan to spend a few days on the JIT, because Anto and I both need to get into it for our respective university projects.

Friday, January 11, 2008

Finding GC roots: using LLVM or parsing assembler files from GCC

PyPy contains a framework for writing custom Garbage Collectors, and a few simple GCs have been written in this framework. A common issue with all these GCs is how to find all the stack roots, i.e. all the pointers to live GC-managed objects currently stored in local variables, in all the callers of the current function. The current solution is to maintain a custom shadow stack of roots, where all functions push and pop copies of their local variables of type "GC pointer". Clearly this is an overhead. Can we remove it?

LLVM has recently grown some support for this. By emitting markers in the LLVM source and with the help of a bit of custom C++ code, we can generate stack maps for the functions compiled by LLVM. Then, with 100% non-portable code in our framework GC's root finding algorithm, we can walk the machine stack and locate where in each stack frame LLVM stores the GC pointers. (Yes, I mean non-portable: LLVM offers no help for doing that. Maybe it will at some point, though I didn't manage to explain why this is an issue to people working on this in LLVM so far...). I've tried that approach in the llvmgcroot branch. Over the manually-managed shadow stack, this gives speed improvements which are, very roughly, on the order of 5%.

Note that this prevents some optimizations in LLVM, because it forces it to allocate all local variables of type "GC pointer" in the stack; it cannot keep them in registers and it must assume that they can be changed more or less at any time (as moving GCs do). Can we do better?

Actually, yes. We can even do better in the C backend, using a GCC hack. GCC has this nice extension:

asm("bla", constrains);
This is meant to generate assembler instructions directly from C. Internally, GCC considers the whole asm() as a single regular instruction of its intermediate language; the constrains are expressed in the same way as the constrains for all the prebuilt intermediate language instructions. They express things like input and output operands of the instruction, whether they can live in memory or in registers, whether the whole instruction has side-effects, etc. The nice thing about asm() is that it doesn't kill any optimization whatsoever in GCC - it's your job to make sure that you use the correct constrains.

So what I've tried in the asmgcroot branch is to use asm() as markers. In this branch, the C backend produces code like this after each function call, for each local variable containing a live GC pointer:

asm("/* GCROOT %0 */" : "=g"(localvar) : "0"(localvar) : "memory");

This causes GCC to emit the following line in the assembler file it generates:

/* GCROOT register-or-memory-containing-localvar */

I won't go in the details of the asm() line above - the constrains are just enough to make sure that GCC doesn't optimize too much, but don't prevent most optimizations from occurring. For example, the localvar can be in a register.

The assembler will just ignore the line above; it is a comment. But what we can do is write our own tool parsing the assembler files. This tool locates the /* GCROOT */ comments and follows where the register or memory location in the comment comes from (to do this it must follow the control flow and data flow of the function). This allows it to build a stack map: for each call instruction it knows exactly which registers and frame stack locations contain a live GC pointer. The stack map is then emitted in an extra assembler file that we link with the rest. As with LLVM above, the stack map is then used at run-time by non-portable code written in our GC's stack root tracker.

Yes, that's rather insane. But at least, we don't need to modify the assembler file - just read it. If GCC is too clever in its optimizations, the custom parser will get lost and complain cleanly; but I think that it is relatively safe in the sense that GCC optimizations should not be able to make the custom parser produce wrong results.

The branch is not merged because it's probably too insane to merge (not to mention, it's probably not portable to non-GCC compilers, and it is completely platform-specific). Still, it gives good results, better that the pure LLVM approach - on the order of 10% to 25% speed-ups for pypy-c.

Tuesday, January 8, 2008

Visualizing a Python tokenizer

Armin and me have been working on PyPy's parser and bytecode compiler for the Python language in the last days. Armin implemented several bytecode optimizations that CPython has since a while whereas I tried to refactor our tokenizer and parser (because our existing parser is rather slow and also not very nice code). Armin is mostly done whereas the new parser is not very far yet.

What is done, however, is the Python tokenizer. It is implemented in the usual way, by using a set of regular expressions to generate a deterministic finite automaton (DFA). This automaton is then turned into a big function which does the actual tokenization. Of course the picture is not quite as simple for Python, because it is not possible to tokenize Python using only regular expressions. To generate the proper "indent" and "dedent" tokens it would be necessary to keep state (the previous indentation levels) which a DFA cannot do. This is solved by postprocessing the tokens that the tokenizer produces to turn whitespace tokens into the proper indent and dedent tokens.

For debugging purposes I implemented a visualization tool for DFAs using PyPy's pygame-based graph viewer. The graph viewer is able to visualize interactively any graph given in the graph-description language of Graphviz. Looking at the tokenizing DFA for Python is rather instructive, both for understanding how tokenizing works and (maybe) for understanding the Python language. To try it, download the dot file of the DFA and run from a pypy checkout:

$ python pypy/bin/

The following is a screenshot of the graphviewer:

For people who don't want do checkout PyPy I generated a (rather big) png for the DFA.

Next thing I would like to do (apart from actually finishing the parser, of course :-) ) is visualize the Python grammar itself using syntax diagrams or something similar. So far I couldn't really find a program to do that, though.