Friday, November 7, 2008

Porting the JIT to CLI (part 2)

In my previous post, we saw that PyPy JIT generator can produce huge speedups when applied to the tlc toy language.

In this post we will dive a bit into the internals of PyPy JIT, to see how it manages to do so. Note that this is a very high level overview of how the JIT works, and applies to all backends. Then, in the third post of this series, we will look closer at the CLI JIT backend, seeing how it works around some .NET limitations and how the generated code looks like.

PyPy JIT for dummies

As you surely know, the key idea of PyPy is that we are too lazy to write a JIT of our own: so, instead of passing nights writing a JIT, we pass years coding a JIT generator that writes the JIT for us :-).

I'm not going to explain how the JIT generator does its job, (perhaps this will be the subject of another blog post), but how the generated JIT works.

There are values that, if known at compile-time (i.e., when the JIT compiler runs), let the JIT to produce very efficient code. In a dynamic language, types are the primary example: for instance, suppose you are a compiler and you have to compile to following Python function:

def mysum(a):
  return a + 1

At compile time, you don't have any knowledge about the type of the parameter: it could be integer, float, an user defined object, etc. In this situation, the only safe choice is to emit code which does the usual, slow, full lookup to know how to perform the operations.

On the other hand, suppose that you knew in advance that the parameter is an integer: this time, you could emit code that exploits this extra knowledge, by performing directly a fast integer addition.

The idea behind PyPy JIT is that if you don't have enough knowledge to generate efficient code, you stop compiling and wait until you know exactly what you need. Concretely, you emit code that runs until the point where you stopped the compilation, then it triggers a special procedure that restarts the compiler. This time the JIT compiler knows everything you need, because you can inspect the state of the running program.

Let's see an example: the first time the JIT compiles mysum, it produces something like this pseudo-code:

PyObject mysum_compiled(PyObject a)
{
  Type a_type = a.GetType();
  switch(a_type) {
      default: continue_compilation(a_type, <position>);
  }
}

If you call mysum(41), the execution goes in the default branch of the switch, thus calling continue_compilation: its job is to restart the JIT compiler, which now can emit fast code because it knows the exact type of a; then, it modifies the original mysum_compiled function, in order to make it executing the newly generated code the next time it encounters an integer at that point:

PyObject mysum_compiled(PyObject a)
{
  Type a_type = a.GetType();
  switch(a_type) {
      PyInteger: return new PyInteger(a.value+1); // fast path!
      default: continue_compilation(a_type, <position>);
  }
}

From now on, every time we call mysum with an integer argument, the JIT compiler is not called anymore and the fast path is directly executed; if we happen to call mysum with a float arguments, the switch goes again in the default branch, and the JIT compiler is started once more to produce fast code also for this case. What happens in practice is that compile-time and runtime are continuously intermixed, until the switches are stable enough and the compiler is not needed anymore.

In PyPy jargon, this kind of "growable switch" is called flexswitch, and it's one of the most important concept of our JIT generator.

Promotion

How can the JIT generator know which values are useful to know to generate efficient code and which aren't? Unfortunately it can't, or at least our JIT generator is not smart enough at the moment.

To get the best from it, the developers of the VM need to instruct it by annotating the variables on which we want the JIT to stop until it knows the actual values; this is done by using particular hints, called promote and promote_class; variables annotated with such hints are said to be promoted. If something is promoted, a flexswitch is used to gain information about it, as seen in the last section.

For an example, let's look at an excerpt from main dispatch loop of the tlc virtual machine:

elif opcode == ADD:
  a, b = stack.pop(), stack.pop()
  hint(a, promote_class=True)
  hint(b, promote_class=True)
  stack.append(b.add(a))

This the implementation of the ADD opcode: first, it pops two values from the stack; then, it computes the result; finally, it push the result to the stack again. In between, both the classes of a and b have been promoted: this means that when the JIT emits the code for b.add(a), it knows exactly what is happening: if it sees that both are instances of the IntObj class, it inlines the method call and emits a fast integer addition instead.

Virtuals

The other important concept of the JIT is the presence of virtual structures, virtual lists, and virtual dictionaries. Again, I'm not going to explain in depth how they work, but only why they are so important for generating highly efficient code.

The essence of virtuals is that you don't allocate objects until you really need to do it, e.g. because they are being passed as an argument to some external function. Instead, we store all the informations we need as local variables; e.g., in the case of a virtual structure, we create as many local variables as the number of its fields: if the structure escapes the local scope, we force it to a real object, by allocating memory on the heap and initializing it after the current value of the local variables.

This technique allows the JIT to avoid the allocation of many temporary objects that hold intermediate results; consider for example the following Python loop:

result = 0
for i in range(N):
  result += i
return result

Without the JIT, at each iteration, a new int object is created and bound to the result variable, while the previous one is discarded and not needed anymore. By combining virtuals and promotion, the JIT can emit code that does the whole computation locally, and allocates a real object only at the end, when it escapes from the local scope because it is returned from the function.

Putting it all together

This is, essentially, how PyPy's generated JITs work. To summarize, our JITs emit multiple versions of each chunk of code: each version is specialized and optimized for one particular case.

The cost of selecting the right specialization to use (through flexswitches) is almost always negligible compared to how much time you save by running the fast version instead of the more-general-but-slow one. Moreover, each specialized version knows the exact shape of the objects it's dealing with, so they can be virtualized to make the generated code even more efficient.

At the end, the actual code generation is done by one of the JIT backends: the backends exploit all the knowledge gathered by the previous steps to produce highly efficient code, but this will be the subject of the next blog post.

Tuesday, November 4, 2008

Porting the JIT to CLI (part 1)

As the readers of this blog already know, I have been working on the CLI JIT backend for some months: last Friday, it reached an important milestone, as it is now able to produce huge speedups for a little dynamic language. To know how huge the speedup is, read on :-).

The goal of PyPy JIT generator is to take an interpreter and, with the help of few annotations, automatically generate a JIT compiler for it. In this post, we will talk about the tlc virtual machine: while tlc it is just a toy language, it contains some features that make it an interesting target for our JIT generator.

The tlc virtual machine

tlc is executed by a stack based, dynamically typed virtual machine (for those who knows a bit about the Python VM: does it sound familiar? :-)).

There are three types of objects: integers, nil, and cons cells (i.e. lisp-like pairs of objects).

As the VM is very simple, it provides only few opcodes:

  • opcodes to manipulate the stack, like PUSH, POP, etc.
  • integer operations, like ADD, MUL, all the comparisons, etc.: these operations can only be applied to integers;
  • list operations, like CONS, CAR, CDR: these operations can only be applied to lists;
  • other operations, including jumps and conditional jumps.

The VM is interesting for our purposes because it has a lot of similarities with Python (though on a smaller scale, of course):

  1. it has to do type-checks at runtime before doing most of the operations;
  2. every time you do an arithmetic operation, it has to unbox the operand, do the computation, and the box the result again.

This means that even if you have a program which only uses integers, you are paying a lot of overhead.

To know more about this toy VM, look at its source code: the interesting bits are the classes used to represent objects, and the interp_eval function, which contains the main loop of the virtual machine. As you can see, the implementation is quite straightforward; all the hint calls you see are the special annotations needed by the JIT generator to produce better code.

Let's JIT it!

So, the whole point is to generate a JIT compiler from it, isn't it?

First, checkout a fresh copy of the oo-jit branch:

$ svn co http://codespeak.net/svn/pypy/branch/oo-jit

Then, go to the oo-jit/pypy/jit/tl directory, and compile the tlc VM with the CLI backend and JIT enabled:

$ cd oo-jit/pypy/jit/tl/
$ ../../translator/goal/translate.py -b cli --jit --batch targettlc
...
lot of texts
...

If everything went OK, you now have a targettlc-cli executable, which accepts two arguments: the name of the file containing the tlc program we want to run, and an integer to be passed to it.

Luckily, in the same directory we have a factorial.tlc file that contains the bytecode for a function that -- guess? -- computes the factorial of a given integer; let's try it:

$ ./targettlc-cli factorial.tlc 5
Non jitted:    120 (0.009371 seconds)
Warmup jitted: 120 (0.208954 seconds)
Warmed jitted: 120 (0.000323999999999991 seconds)

Cool, it seems that the result was computed correcly :-). As you can see from the output, we ran the program three times:

  1. by plain interpretation, without any jitting;
  2. with the jit enabled: this run includes the time spent by doing the compilation itself, plus the time spent by running the produced code;
  3. again with the jit enabled, but this time the compilation has already been done, so we are actually measuring how good is the code we produced.

So, it's time to run a benchmark: let's try to compute the factorial of a very big number; the result will be 0, because obviously after a while we overflow, but after all we are interested in the time spent, not in the result:

$ ./targettlc-cli factorial.tlc 5000000
Non jitted:    0 (19.93247 seconds)
Warmup jitted: 0 (0.293229999999998 seconds)
Warmed jitted: 0 (0.0494239999999984 seconds)

$ python -c 'print 19.93247/0.0494239999999984'
403.295362577

And no, I didn't make any mistake in copying&pasting: the jitted version is really 400 times faster that the non jitted one!

Warning: my laptop seems to be not very well suited for benchmarks, as the results vary a lot from run to run; I've run the benchmarks a lot of times, and I got speedup factors up to 500 times, so your results may be different.

More benchmarks

It's also interesting to compare the result with a manual written C# version of the factorial, to see how good is code we produced; to get reasonable results, we need to compute a larger factorial, to let to code to run a bit more:

$ ./targettlc-cli --onlyjit factorial.tlc 100000000
Warmup jitted: 0 (0.980856 seconds)
Warmed jitted: 0 (0.769716 seconds)

$ mono factorial.exe 100000000
C#:            0 (0.153777 seconds)

$ python -c 'print 0.769716/0.153777'
5.00540392907

We know that the generated code is far from being optimal, but probably the factor of five is at least partially due to the fact that Mono's own JIT is optimized for C#-like code, and our code has a completely different shape.

All the benchmarks above were run under Linux, with Mono 1.9.1. Here are the results for the same benchmarks, but run with Microsoft CLR (on a different machine, so the absolute values are not comparable):

$ ./targettlc-cli factorial.tlc 5000000
Non jitted:    0 (15,640625 seconds)
Warmup jitted: 0 (0,4375 seconds)
Warmed jitted: 0 (0,03125 seconds)

$ python -c 'print 15.640625/0.03125'
500.5

$ ./targettlc-cli --onlyjit factorial.tlc 100000000
Warmup jitted: 0 (0,90625 seconds)
Warmed jitted: 0 (0,515625 seconds)

$ ./factorial.exe 100000000
C#:            0 (0,34375 seconds)

$ python -c 'print 0.515625/0.34375'
1.5

The results are even better than before; this is probably thanks to CLR's JIT, that does a better job than Mono when faced to something which is different than the usual C#-like code.

Conclusions (for now)

This is a very important result, because it proves that PyPy's approach to JIT compilers can be applied effectively also to OO virtual machines; the result is even better than what I expected, because when generating code for .NET we have much less freedom than when generating assembly code, and I had to play some tricks to work around some .NET limitations.

Moreover, it worked at the first try :-). I tried to compile the tlc virtual machine as soon as all the related JIT tests were passing, and surprisingly everything worked just fine, even if it was the very first time I was trying to apply some features of the JIT to something bigger than a test: I think this is yet another prove that Test Driven Development just works!

Even if this is a major milestone, the CLI JIT backend is not yet completed: as a consequence it can't still be used for the full PyPy, but all the hardest problems should have been solved now.

Since a lot of readers asked for more technical details, especially about the JIT, I will try to soon write a second blog post explaining how the CLI backend works internally, with a brief look to the generated code to see how it looks like.

Sunday, November 2, 2008

One year PyPy Blog

Last Friday the PyPy Status Blog had its first anniversary. Yay! After not really buying into any of this new-fangled "blog" stuff for a long time we just bit the bullet and got started. Totally surprisingly it even worked. We posted 76 post in the last year, more than one per week. By now we have more than 800 subscribers (according to feedburner), which is quite cool for a rather niche blog.

To make our blog even more interesting, I would like to ask for some feedback via the comments:

  • Which posts did you like in particular?
  • What sort of posts would you be interested in getting more of?
  • Any other improvements we could make?

Tuesday, October 14, 2008

Sprint Discussions: JIT Generator Planning

Background

Finally, the JIT post :-). First some background: Despite our plans at the end of the EU period, PyPy's Python interpreter didn't get a good and widely applicable JIT in the last year. The reason for that was that we discovered that although the basic idea to generate JIT compilers is good, the concrete prototype made during the EU period is basically flawed. It could have been pushed a bit farther, but would have run into deep troubles eventually. One of the problems would have been performance instability: change a seemingly unrelated bit in your source program, and the performance changes in unexpected ways, which is clearly not desirable. Another problem with that old approach is that too much assembler code is generated, leading to memory problems, and also that the generated assembler is bad in various ways, e.g. it is hard in that approach to do proper register allocation.

Therefore we decided that it would be worthless to pursue this direction much further. Instead we tried to research approaches to fixing the inherent problems. This research was largely done in Prolog and I eventually wrote my Master's thesis about it. From the Prolog work we got some good insights into what needs to be done and what sorts of techniques are needed. Also, it inspired Armin to do some more exploration on a small Python prototype which used the lessons learned from Prolog and also some additional ideas from tracing JITs. So far, however, the prototype is neither in RPython, nor much integrated with PyPy.

This research is not the only thing happening in the JIT-area. During the last year, Antonio Cuni was working on bringing the JIT to pypy-cli. This consisted mostly of writing a .NET backend for the old JIT-generator. Some further work is being done since August by John Witulski, who is writing an AMD64 backend for the JIT-generator for his Bachelor's thesis.

Where to go from there

During the sprint we discussed in which directions we should continue now. We plan to work quite a bit on the JIT in the coming months. Both Armin and Anto are in Düsseldorf for four months, and them and me plan to mostly work on the JIT (as well as giving a lecture on "Dynamic Programming Languages", trying to ensnare some more students).

The first step will be to experiment a bit more with Armin's prototype. So far it looks rather promising, but there are some unsolved issues that we need to look into first. The first issue is to think a bit about how to efficiently do profiling to compile only important code paths. The other large issue are so-called "virtualizables". Roughly speaking, they are the frame objects of the interpreter from which the JIT is generated. They need special treatment, because on the one hand it is important that they get optimized away to make the code fast, since the frames are accessed all the time for the local variables; on the other hand they should still be usable for introspection if code is around that is trying to look into them.

When this is done, the prototype needs to be ported to RPython, which is a non-trivial task, since it is rather dynamic so far (it is rather important that the unresolved issues are done before the porting, because once the prototype is in RPython, experimentation will be harder). The porting has the potential to be tedious, but in a sense it is "just work", as opposed to unclear research.

At this point it will become important to think about the backend interface. The interface that the old frontend used to produce assembler code won't be usable for the new approach, so things need to be rearranged slightly. Afterwards the backends will have more information and be invoked at a slightly higher level, which should allow them to produce better code.

When all this is done, the JIT generator will be in a rather good state and it should become possible (modulo a lot of details, of course), to use it on the Python interpreter.

Conclusion

I am intentionally not attaching any time estimates to this blog post. So far our time estimates have not been very accurate when it comes to the JIT, which only lead to disappointment when the JIT failed to materialize. We hope that we will progress in interesting ways in the next four months, but who knows. Note that we are really quite disappointed ourselves that it took so much longer than we planned and hoped. The reason for this is mostly that this work really is research and sometimes it is just hard to predict what sort of problems turn up. Partial evaluation (the basis for our JIT generator) is a 30 years old technique that was always just promising and never really successful, so the fact that we think we can solve its problems in a few years is very much hubris anyway :-). On the positive side, we think that we now know these problems much better than ever before and that we have a plan that has a chance to succeed.

Also we are still convinced that our approach has huge potential, despite the difficulties. If we manage to pull it off, it should be significantly simpler to support new language features in the JIT and also to get speedups on some rather interesting bits of the language. Some ideas we are having include generating a JIT for the regex engine or speed up ctypes-bindings to be nearly as fast as an extension module (or faster?). Also the JIT will be such that by construction the JIT-generated code behaves identical to the original code, which isn't always true for Psyco, for example.

Sprint Discussions: C++ Library Bindings

At the beginning of this year, PyPy grew ctypes support, thanks to generous support by Google. This made it possible to interface with C libraries from our Python interpreter, something that was possible but rather tedious before. What we are lacking so far is a way to interface to large C++ libraries (like GUI libraries). During the sprint we had a brainstorming session about possible approaches for fixing this shortcoming.

For CPython there are a number of approaches in common use:

Those all have the property that they produce some code that is then compiled with a compiler to produce a CPython extension. The produced code also uses functions from CPython's C-API. This model is not simple to use for PyPy in its current state. Since PyPy generates C code automatically, a fixed C-level API does not exist (it is not unlikely that at one point in the future we might have to provide one, but not yet). At the moment, PyPy very much has a "Don't call us, we call you"-approach.

A very different approach is followed by the Reflex package, which is developed at CERN (which has an incredible amount of C++ libraries). It is not mainly intended for writing Python bindings for C++ libraries but instead provides reflection capabilities for C++. The idea is that for every C++ shared library, an additional shared library is produced, which allows together with Reflex to introspect properties of C++ classes, methods, etc. at runtime. These facilities are then used for writing a small generic CPython extension module, that allows CPython to use any C++ library for which this reflection information was generated.

This approach is a bit similar to the ctypes module, apart from the fact that ctypes does not use any reflection information, but the user has to specify the data structures that occur in the C code herself. This makes it sometimes rather burdensome to write cross-platform library bindings.

For PyPy the approach seems rather fitting: We would need to implement only the generic extension module and could then use any number of C++ libraries. Of course some more evaluation is needed (e.g. to find out whether there are any restrictions for the C++ code that the library can use and how bothersome it is to get this reflection information for a large library) but so far it seems promising.

Sunday, October 12, 2008

Sprint Discussions: Release Planning

One of the discussions that happened during the sprint was about how to approach the next PyPy release. There hasn't been a release since the end of the EU period, which is not an optimal situation. Therefore we plan to make a 1.1 release at the beginning of next year, ideally before Pycon US. We'd also like to move towards time-based releases. This will be greatly helped by the new buildbot infrastructure, which allows us to decide when the state of the codebase is stable enough to release.

Another goal of the release is to involve more people from the wider PyPy community by having bugdays and generally asking for more support. This will be particularly useful for bugs on platforms that no one of the core developers group is using.

Feature-wise the release will mostly contain CPython 2.5 language support, including some new extension modules (like ctypes, expat, sqlite). In addition we plan to make it easier to actually install and use the PyPy Python interpreter, which means some sort of proper installation procedure and supporting distutils on top of PyPy. Another part of the release will be support for fully sand-boxing an interpreter.

Additionally there were also a large number of improvements on several levels since the last release, like optimizations, faster oldstyle-classes, better GCs, correct finalization behaviour, lots and lots of bugfixes, better threading support (still with the GIL), some work on improving memory behaviour, ...

In contrast to our last release, we will focus mainly on PyPy's Python Intepreter and more particularly its C-version. There are also various experimental interpreters that PyPy contains, like for Prolog, Smalltalk, JavaScript and Scheme. We also don't intend to put the LLVM and Javascipt backends in the release, since they are essentially unmaintained and at least partially broken. If anybody is particularly interested in one of these components, please feel free to step up and take responsibility for them. Another thing that the release won't contain is a JIT. I plan to make another blog-post about this soon, stay tuned.

Saturday, October 11, 2008

Düsseldorf Sprint Report Days 1-3

The Düsseldorf sprint is currently in full progress and this post will try to summarize what progress has been made in the last days. We are (again) sprinting at the STUPS group of the Düsseldorf University. You can find the sprint announcement and the daily planning file.

Holger and Samuele put quite some effort over several days into setting up and improving PyPy's testing infrastructure. PyPy has a variety of tests. On the one hand, there are of course our own tests. But then we also have the CPython tests that should be run on top of pypy-c. Up to now we used a custom-made pile of hacks, held together by lots of duct-tape. It consisted of a variety of different machines running different things with different reporting solutions. Some of the old test-results can still be found on wyvern. Now we are moving to a buildbot based solution together with a custom reporter to have a view similar to the old one. Some details are not quite finished yet, but most of the things are already working rather well (currently all the results displayed are from the 2.5-merge branch).

Another large (and ongoing) topic of work is the 2.5 branch. It contains the work done by our Summer-of-Code student, Bruno Gola, of adding CPython 2.5 features to PyPy's Python interpreter. While Bruno implemented most language features and imported the 2.5 stdlib into PyPy, a lot of details were still missing. In the last days nearly everybody worked on fixing small issues and failing stdlib tests. While doing that we tried to categorize some CPython tests as implementation dependant so that we can skip them when running on PyPy.

Memory Improvements

One goal of the sprint is to measure and to reduce the memory behaviour of our Python interpreter. The idea is to make pypy-c a realistic option for use on embedded devices. By memory behaviour we mean both the dynamic memory usage (how much bytes does a dict or an instance take) as well as the size of the executable and details of the GC strategy.

Alexander, Carl Friedrich and Antonio did some work on analyzing the static data that a pypy-c executable contains. Our executables have the tendency to be rather large, both due to a lot of code and due to a large amount of static data. The analysis didn't give any really surprising results, the problem is mostly that we have a lot of static data originating from a bit everywhere in our program. Two big offenders are the unicodedata-module with about 750 KB of static data and the multimethod-tables with about 150 KB of data.

Armin, Iko, Anto and Maciek worked on a new approach to malloc-removal. This is (for PyPy) a crucial optimization of the translation toolchain that performs escape analysis to find out which objects don't outlive the frame they were allocated in. Since RPython is garbage-collected we usually have a lot of allocations, so it is important to statically get rid of many of them. To successfully do that, some inlining is needed to give the analysis more context. This leads to the fact that we have rather aggressive inlining-settings to allow as much malloc-removal as possible. The new approach tries to inline functions only if this actually leads to the successful removal of a malloc operation. The code is not finished quite yet, so it remains to be seen how successful it will be.

Before the sprint Maciek had started to work on a mark-compact GC for PyPy. The idea is that it is better for memory-constrained-environments because it does not double the memory-requirements during collections. During the sprint Armin and Maciek worked on cleaning up the code a bit and then merging the branch. An interesting property of the mark-compact GC is that after a collection all the memory that is not currently used by the program is returned to the operating system. Right now the GC is not as fast as our more mature ones, but it probably will be the basis for future tweaking.

A small thing that was done by Alexander and Carl Friedrich to make objects smaller is to enable shared instance dictionaries also for instances of old-style classes. Before it worked only for instances of new-style classes. Shared instance dictionaries are a way to reduce the memory-usage of instances. In the optimal case, it gives the same memory-savings that __slots__ are giving, but without any behavioural changes. Conceptually it is very similar e.g. to the notion of "map" in the Self project, or the hidden classes that Google Chrome's V8 is using (click on the link, there are nice graphics). The difference is that for them it is mostly a way to get faster attribute access, and PyPy is so far only using it form memory savings (but that might change in the future).

In parallel to all the other work, John Witulski worked tirelessly on advancing the AMD64-JIT-backend. John has the implementation of this backend as the topic of his Bachelor's thesis. He is progressing quite well (especially also considering that this is his first sizeable Python project ever), just sometimes being impaired by such annoyances as errors in the official Intel documentation. By now the backend is supporting many integer operations and control flow.

Friday, October 10, 2008

Prolog-JIT Master's-Thesis Finished

As we already blogged, in the last half-year or so, Michael Leuschel, Armin and me did a lot of JIT generator work on a Prolog prototype. The idea was to experiment more quickly with some techniques than what would have been possible with RPython. These experiments were quite successful in themselves. With very little code we managed to get a JIT that is not doing too badly when compared to existing projects for Prolog.

This Prolog work was also the subject of my Master's thesis. I finished the thesis about two weeks ago (and since then have been mostly sleeping and then sprinting). The thesis should be self-contained when it comes to explaining the JIT concepts but needs knowledge of Prolog to be understandable.

Saturday, September 20, 2008

PyPy/Python at the Maemo summit

Maciej and me visited the Maemo Summit in Berlin - a community meetup around Nokia's Linux based mobile platform. We spontaneously did a lightning talk about a first running pypy-c on Maemo and got nice feedback.

We also had a nice lunch with guys from the INDT in Brazil, including Marcio Marcedo and Marcelo Eduardo. It turns out that Python is used a lot on Maemo, for example the nice Canola UI is done with it. Will be interesting to see how this shapes up in relation to the iPhone and Android.

A lot of Nokia engineers were around and they announced that from October on they are going for weekly new releases of their SDK for the new Fremantle (Maemo-5) debian-based platform until the SDK becomes final - if we got this right.

Funnily enough, we met Marius Gedminas from the Programmers of Vilnius - he gave a lightning talk on his impressions as a community member. We think python programmers really should go much more to non-Python centric conferences.

The whole event took place at the C-Base - was a bit crammed in some of the sessions with something like 200 people attending.
cheers, Maciej and Holger

Wednesday, September 17, 2008

Pycon UK, Javascript and the GIL

Just got back from Pycon UK 2008 - here are some impressions.

Both the keynote speakers Mark Shuttleworth (Canonical) and Ted Leung (Sun Microsystems) expressed their concerns about Javascript becoming so fast and prominent that it could displace Python in the future. They also highlighted the fact that Multi-core systems get cheaper and more popular also on desktop computers or notebooks. They challenged the community to advance Python implementations to exploit it. Question was up what PyPy can do here. As it stands, PyPy still uses the good old Global Interpreter Lock (GIL) but our approaches should indeed lend itself well to do experimentation with free threading.

During the 2-day conference we met many interesting people, most notably the guys from Resolver, among them William Reade who is working on IronClad -- which implements a fake python25.dll on top of IronPython. He presented some good results for Numpy in his lightning talk. This approach is surely something to follow closely and potentially use for PyPy.

We also had lunch and a couple of chats with Jacob Kaplan-Moss from Django fame - he is apparently up to try use PyPy's sandboxing features for one of his projects, cool!

Conference itself was well organized for the 230 attending people - although the venue might be a bit small for next year's EuroPython. Ah, and we gave three well attended talks, find the slides here:

cheers,
Holger, Maciej, Anto (associated through merlinux, btw)

Friday, September 5, 2008

Düsseldorf PyPy sprint 5-13th October, 2008

The PyPy team is happy to announce the next sprint, which will take place in the Computer Science Department of the University of Düsseldorf, Germany. Sprinting will start on the 6th of October and go on till the 12th. Please arrive on the day before if you want to come.

Topics of the sprint will be aiming at a 1.1 release and to work on integrating PyPy better with small devices. Other topics are also welcome!

We will try to find a hotel with group rates, so if you are interested, please sign up soon! See the announcement for more details.

Friday, August 22, 2008

pylib/py.test 0.9.2 released

PyPy and its 14638 automated tests use the py.test tool which is also used by many other projects. PyPy developers have actually driven and contributed a lot to its development. I just released version 0.9.2 of the py lib mainly fixing Windows issues and providing better packaging and integration with setuptools. It's usable completely independently from PyPy - "easy_install py" gives you the py.test command line. Of course you can run py.test on top of a translated PyPy version as well. Here is a quick summary of what the py lib provides besides py.test:
  • py.execnet: ad-hoc code distribution to SSH, Socket and local sub processes
  • py.magic.greenlet: micro-threads on standard CPython ("stackless-light") and PyPy
  • py.path: path abstractions over local and subversion files
  • py.code: dynamic code compile and traceback printing support
  • tested against Linux, Win32, OSX, works on python 2.3-2.6
Good general entry points for installation and documentation: have fun, holger krekel

Tuesday, August 19, 2008

New translation option: --opt

Hi all,

A few command-line options for translate.py have changed. Most interesting is that optimization levels are selected with the option --opt, or -O for short. This replaces --allopts, which was also called --faassen in reference to a person who is actually not involved in PyPy (so that was a bit of a strange joke). Also, --allworkingmodules is the default nowadays, and can be cancelled with --no-allworkingmodules. Threads are also included in --allworkingmodules now.

Examples:

  • translate.py (reasonable default, corresponds to --opt=2)
  • translate.py --opt=3 (best, maybe 10-20% faster)
  • translate.py --opt=1 (translation is faster and less RAM-hungry)

For more information, see:

PyPy runs unmodified django 1.0 beta

This is just a quick update post to previous post - django folks commited all outstanding tickets and we are able to run unmodified django on top of pypy-c. Instructions how to do it are well explained on django wiki entry

enjoy,
fijal

Monday, July 14, 2008

Europython 2008 PyPy talks and sprint sum up

The EuroPython 2008 conference and sprints have finished - it certainly was a very eventful and successful conference for PyPy. And many very interesting non-PyPy talks as well. PyPy presentations are available online: PyPy status talk PyPy for the rest of us, PyPy behind the scenes. Armin and Maciej also did a well-attended talk about PyPy's garbage collection, but that was quite interactive, no slides.

The talks were all well visited and we got good questions. However, we still need to work on sorting out the "PyPy technology cloud" and how to present it to different audiences. Anyway, we are happy to hear feedback or questions about the talks!

After the conference there was a three-day PyPy sprint. Despite the fact that most PyPy core developers were zombies, we made good progress. Particularly our newcomers did very well. Here are some results:
  • itertools rewritten in RPython for performance by Jakub Gustak and Andrew Durdin
  • a new ctypes based dbm and hashlib module, both by Gasper Zejn with support from Henrik Vendelbo, they also got ctypes to nicely work on OSX. (sorry for lack of proper letters in names :)
  • implement builtin function call profiling by Stephan Diehl, Antonio and Armin.
  • running Pinax on top of pypy-c, by Henrik, Holger, Gasper.
  • Jim Baker started a _rawffi.py for Jython using JNA aiming to provide support to run PyPy's ctypes on top of Jython. When Jython gets this to run, PyPy's JVM backend should be able to use it. Talk about Code Reuse :)
  • oldstyle classes are now the default, this makes PyPy mimick very closely cpython's 2.5 object model.
  • Andrew started a port of the Malbolge interpreter written in Python to RPython (obviously the only missing link for PyPy to take over the world).
  • various cleanups (a new option "--lonepycfiles" helps with saner imports, remove int-float comparison shortcuts, ...)
At the end of the sprint we also discussed initial plans for a 1.1 release which we'd like to make happen this year. So we are generally looking forward to a busy rest of 2008 and luckily this starts by many of us taking a good vacation first :)

Cheers,
fijal & holger

Sunday, July 13, 2008

Finding Bugs in PyPy with a Fuzzer

Last week I played a bit with Fusil, which is a fuzzing framework. The idea is to feed the interpreter code that calls the functions of a module with random values of various types as arguments in the hope that one hits an unchecked case. This is done until a problem is hit , the most common problem being a segfault. Victor Stinner, the author of Fusil, is a regular in the PyPy IRC channel and thankfully helped me getting started with Fusil. I used his project description for CPython as a starting point and tweaked it a bit. Reason is that PyPy is harder to segfault and so I tweaked Fusil to also count uncaught RPython-level exceptions as such a problem. (RPython has full exception support, and if an RPython-exception escapes to the top level, the Python interpreter aborts. One should not be able to exploit this but but for a user it is bad enough, because such exceptions cannot be caught from Python code.)

Using Fusil I found a number of cases where such exceptions happened (in some pickle support-code, in the expat parser, in the os and in the termios module) and also one or two segfaults (in the parser module, of all places). I fixed all these problems so that by now the fuzzer just runs for a very long time and only finds things that take too long (so they count as a way to do a DoS attack) like pow(12355123123L, 12351512123121L) or round(1, 1000000000) (the latter should probably be fixed). This probably just means that the fuzzer is not good enough, because there are certainly segfaults left in PyPy. However, the fact that it is rather hard to find them validates our approach of using a high-level memory-managed language for our interpreter. Victor tells me that it is rather easy to find segfaults in CPython this way, he already found quite some problems.

Saturday, July 12, 2008

PyPy's Python runs Pinax / Django

During the EP2008 sprint we got Pinax running on top of PyPy. At our play1 server we have it running on top of pypy-c. Not that you'll notice many differences to the original site but that's the point, isn't it? ... Well, in fact i am too lazy to customize our play1 version now - i rather spent a nice evening with the other sprint guys :) Pinax integrates numerous reusable Django apps to take care of the things that many sites have in common. Many thanks particularly to Henrik Vendelbo who sorted out various Pinax and PyPy issues, and wrote up a nice DjangoAndPyPy wiki page describing the installation process. greetings from Vilnius (Lithunia), Holger

Thursday, July 10, 2008

EP2008: PyPy meets Jython

One of the great events at EuroPython 2008 were our chats and meetings with the Jython and Sun people. The Jython people recently are pushing into releasing Python version 2.5 and they currently pursue many interesting sub projects. Coincidentally, PyPy also has tons of interesting areas and results :) So we eventually got into brainstorming a number of possible technical collab ideas. Further below is a first list as i wrote it down from our 10 people PyPy / Jython 30 minute close up meeting yesterday. It felt great to be able to talk to the Jython people this way - kudos to Sun for their clear commitments and open ways to go about things! I sense a genuine interest on fair collaboration with non-java developer communities. Seems like they are serious about not focusing on "Java this", "Java that" anymore but rather focus on the JVM platform. Good! And about language independent interest in ambitious technology. Even Better! I am tensed to see how things go from here. So here the list of technical collab ideas:
  • ctypes - try to create _rawffi module in Java for Jython, which will enable Jython to reuse our existing ctypes implementation (and have PyPy use the Jython-rawffi for its own for PyPy.JVM)
  • generally see to share work / (continue) collaborate regarding extension modules
  • Jython/PyPy (and eventually IronPython): document known differences to CPython, maybe in a PEP
  • Python Interpreter for Jython (in order to run CPython's .pyc files): re-use pypy's bytecode evaluator, implement a "Jython object space".
  • re-use rpython-extension modules for jython (e.g. SRE), by compiling them to Java and reusing as a native library.
  • collaborate on testing framework / benchmarking, have a common site to show test results
  • make py.test compatible with jython
  • come up with a set of "pure Python language" tests, which would gather and refactor tests from CPython, PyPy and Jython.
  • look into using java types / jython approaches for implementing free threading.
  • share knowledge regarding JIT / psyco
If you have any more ideas, comments or would like to join efforts, let us know! Cheers and thanks to Ted Leung, Frank Wierzbiki, Jim Baker and Tobias Ivarsson from Sun and Jython fame respectively, Holger

Monday, July 7, 2008

PyPy at the EuroPython 2008

Greetings from Vilnius, Lithuania. There were already two pypy talks, one performed by Jacob Hallen PyPy for the rest of us and second by Maciej Fijalkowski PyPy status talk. The thing that we forgotten to tell is that PyPy sandboxing feature can also easily limit CPU and RAM usage as well as any other possible resource (like network transfer). For anyone who would like to join, there is a PyPy sprint after the conference.

Cheers,
arigo & fijal

Monday, June 30, 2008

JIT in Prolog

Hi all,

Some news from the JIT front. Progress on the JIT has been low-profile in the past few months. No big results to announce yet, but we have played with some new ideas, and they are now documented as a draft research paper: Towards Just-In-Time Compilation and Specialisation of Prolog.

Prolog? Yes. To understand this slightly unusual choice of programming language, here is first some background about our JIT.

PyPy contains not a JIT but a JIT generator, which means that we only write an interpreter for a language (say, the complete Python language), and we get a JIT "for free". More precisely, it's not for free: we had to write the JIT generator, of course, as well as some amount of subtle generic support code. The JIT generator preprocesses the (complete Python) interpreter that we wrote and links the result with the generic support code; the result is a (complete Python) JIT.

The way that this works so far gives us a generated JIT that is very similar to Psyco in the way it works. But Psyco has issues (and so the current PyPy JITs have the same issues): it can sometimes produce too much machine code, e.g. by failing to notice that two versions of the machine code are close enough that they should really be one; and it can also sometimes fail in the opposite way, by making a single sub-efficient version of the machine code instead of several efficient specialized versions.

A few months ago we have chosen to experiment with improving this instead of finishing and polishing what we had so far. The choice was mostly because we were (and still are) busy finishing and polishing everything else in PyPy, so it was more fun to keep at least the JIT on the experimental side. Besides, PyPy is now getting to a rather good and complete state, and it is quite usable without the JIT already.

Anyway, enough excuses. Why is this about Prolog?

In PyPy, both the (complete Python) interpreter and the JIT support code are in RPython. Now RPython is not an extremely complicated language, but still, it is far from the top on a minimalism scale. In general, this is a good in practice (or at least I think so): it gives a reasonable balance because it is convenient to write interpreters in RPython, while not being so bloated that it makes our translation toolchain horribly complicated (e.g. writing garbage collectors for RPython - or even JIT generators - is reasonable). Still, it is not the best choice for early research-level experimentation.

So what we did instead recently is hand-write, in Prolog, a JIT that looks similar to what we would like to achieve for RPython with our JIT generator. This gave much quicker turnaround times than we were used to when we played around directly with RPython. We wrote tiny example interpreters in Prolog (of course not a complete Python interpreter). Self-inspection is trivial in Prolog, and generating Prolog code at runtime is very easy too. Moreover, many other issues are also easier in Prolog: for example, all data structures are immutable "terms". Other languages than Prolog would have worked, too, but it happens to be one that we (Carl Friderich, Michael Leuschel and myself) are familiar with -- not to mention that it's basically a nice small dynamic language.

Of course, all this is closely related to what we want to do in PyPy. The fundamental issues are the same. Indeed, in PyPy, the major goals of the JIT are to remove, first, the overhead of allocating objects all the time (e.g. integers), and second, the overhead of dynamic dispatch (e.g. finding out that it's integers we are adding). The equivalent goals in Prolog are, first, to avoid creating short-lived terms, and second, to remove the overhead of dispatch (typically, the dispatching to multiple clauses). If you are familiar with Prolog you can find more details about this in the paper. So far we already played with many possible solutions in the Prolog JIT, and the paper describes the most mature one; we have more experimentation in mind. The main point here is that these are mostly language-independent techniques (anything that works both in Prolog and in RPython has to be language-independent, right? :-)

In summary, besides the nice goal of speeding up Prolog, we are trying to focus our Prolog JIT on the issues and goals that have equivalents in the PyPy JIT generator. So in the end we are pretty convinced that it will give us something that we can backport to PyPy -- good ideas about what works and what doesn't, as well as some concrete algorithms.