Saturday, July 3, 2010

Comparing SPUR to PyPy

Recently, I've become aware of the SPUR project of Microsoft Research and read some of their papers (the tech report "SPUR: A Trace-Based JIT Compiler for CIL" is very cool). I found the project to be very interesting and since their approach is in many ways related to what PyPy is doing, I now want to compare and contrast the two projects.

A Tracing JIT for .NET

SPUR consist of two parts: On the one hand it is a VM for CIL, the bytecode of the .NET VM. This VM uses a tracing JIT compiler to compile the programs it is running to machine code. As opposed to most existing VMs that have a tracing JIT it does not use an interpreter at all. Instead it contains various variants of a JIT compiler that produce different versions of each method. Those are:

  • a profiling JIT which produces code that does lightweight profiling when running the compiled method
  • a tracing JIT which produces code that produces a trace when running the compiled method
  • a transfer-tail JIT which is used to produce code which is run to get from a failing guard back to the normal profiling version of a method
  • an optimizing JIT that actually optimizes traces and turns them into machine code

Optimizations Done by the Optimizing JIT

SPUR's optimizing JIT does a number of powerful optimizations on the traces before it turns them into machine code. Among them are usual compiler optimizations such as register allocation, common subexpression elimination, loop invariant code motion, etc.

It also performs some optimizations that are specific to the tracing context and are thus not commonly found in "normal" compilers:

  • guard implication: if a guard is implied by an earlier guard, it is removed
  • guard strengthening: if there is a sequence of guards that become stronger and stronger (i.e. each guard implies the previous one), the first guard in the sequence is replaced by the last one, and all others are removed. This can greatly reduce the number of guards and is generally safe. It can shift a guard failure to an earlier point in the trace, but the failure would have occurred at some point in the trace anyway.
  • load/store optimizations: this is an optimization for memory reads/writes. If several loads from the same memory location occur without writes in between, all but the first one are removed. Similarly, if a write to a memory location is performed, this write is delayed as much as possible. If there is a write to the same location soon afterwards, the first write can be removed.
  • escape analysis: for allocations that occur in a loop, the optimizer checks whether the resulting object escapes the loop. If not, the allocation is moved before the loop, so that only one object needs to be allocated, instead of one every loop iteration.
  • user-controlled loop unrolling: not exactly an optimization, but an interesting feature anyway. It is possible to annotate a CIL method with a special decorator [TraceUnfold] and then the tracing JIT will fully unroll the loops it contains. This can be useful for loops than are known to run a small and fixed number of iterations for each call-site.
  • user controlled tracing: The user can also control tracing up to a point. Methods can be annotated with [NativeCall] to tell the tracer to never trace their execution. Instead they appear as a direct call in the trace.

A JavaScript Implementation

In addition to the tracing JIT I just described, SPUR also contains a JavaScript implementation for .NET. The approach of this implementation is to translate JavaScript to CIL bytecode, doing some amount of type inference to detect variables that have fixed types. All operations where no precise type could be determined are implemented with calls to a JavaScript runtime system, which does the necessary type dispatching. The JavaScript runtime is implemented in C#.

The JavaScript implementation and the CLI VM with a tracing JIT sound quite unrelated at first, but together they amplify each other. The tracing JIT traces the JavaScript functions that have been translated to CLI bytecode. Since the JavaScript runtime is in C#, it exists as CLI bytecode too. Thus it can be inlined into the JavaScript functions by the tracer. This is highly beneficial, since it exposes the runtime type dispatching of the JavaScript operations to the optimizations of the tracing JIT. Particularly the common expression elimination helps the JavaScript code. If a series of operations is performed on the same object, the operations will all do the same type checks. All but the type checks of the first operation can be removed by the optimizer.

Performance Results

The speed results of the combined JavaScript implementation and tracing JIT are quite impressive. It beats TraceMonkey for most benchmarks in SunSpider (apart from some string-heavy benchmarks that are quite slow) and can compete with V8 in many of them. However, all this is steady-state performance and it seems SPUR's compile time is rather bad currently.

Further Possibilities

A further (so far still hypothetical) advantage of SPUR is that the approach can optimize cases where execution crosses the border of two different systems. If somebody wrote an HTML layout engine and a DOM in C# to get a web browser and integrated it with the JavaScript implementation described above, the tracing JIT could optimize DOM manipulations performed by JavaScript code as well as callbacks from the browser into JavaScript code.

Of course the approach SPUR takes to implement JavaScript is completely generalizable. It should be possible to implement other dynamic languages in the same way as JavaScript using SPUR. One would have to write a runtime system for the language in C#, as well as a compiler from the language into CIL bytecode. Given these two elements, SPUR's tracing JIT compiler would probably do a reasonable job at optimizing this other language (of course in practise, the language implementation would need some tweaking and annotations to make it really fast).

Comparison With PyPy

The goals of PyPy and SPUR are very similar. Both projects want to implement dynamic languages in an efficient way by using a tracing JIT. Both apply the tracing JIT "one level down", i.e. the runtime system of the dynamic language is visible to the tracing JIT. This is the crucial point of the approach of both projects. Since the runtime system of the dynamic language is visible to the tracing JIT, the JIT can optimize programs in that dynamic language. It does not itself need to know about the semantics of the dynamic language. This makes the tracing JIT usable for a variety of dynamic languages. It also means that the two halves can be implemented and debugged independently.

In SPUR, C# (or another language that is compilable to CIL) plays the role of RPython, and CIL is equivalent to the intermediate format that PyPy's translation toolchain uses. Both formats operate on a similar abstraction level, they are quite close to C, but still have support for the object system of their respective language and are garbage-collected.

SPUR supports only a JavaScript implementation so far, which could maybe change in the future. Thus JavaScript in SPUR corresponds to Python in PyPy, which was the first dynamic language implemented in PyPy (and is also the reason for PyPy's existence).

There are obviously also differences between the two projects, although many of them are only skin-deep. The largest difference is the reliance of SPUR on compilers on all levels. PyPy takes the opposite approach of using interpreters almost everywhere. The parts of PyPy that correspond to SPUR's compilers are (I will use the Python implementation of PyPy as an example):

  • the JavaScript-to-CIL compiler corresponds to the Python interpreter of PyPy
  • the profiling JIT corresponds to a part of PyPy's translation toolchain which adds some profiling support in the process of turning RPython code into C code,
  • the tracing JIT corresponds to a special interpreter in the PyPy JIT which executes an RPython program and produces a trace of the execution
  • the transfer-tail JIT corresponds to PyPy's blackhole interpreter, also called fallback interpreter
  • the optimizing JIT corresponds to the optimizers and backends of PyPy's JIT

PyPy's Optimizations

Comparing the optimizations that the two projects perform, the biggest difference is that PyPy does "trace stitching" instead of fully supporting trace trees. The difference between the two concerns what happens when a new trace gets added to an existing loop. The new trace starts from a guard in the existing loop that was observed to fail often. Trace stitching means that the loop is just patched with a jump to the new trace. SPUR instead recompiles the whole trace tree, which gives the optimizers more opportunities, but also makes compilation a lot slower. Another difference is that PyPy does not perform loop-invariant code motion yet.

Many of the remaining optimizations are very similar. PyPy supports guard implication as well as guard strengthening. It has some load/store optimizations, but PyPy's alias analysis is quite rudimentary. On the other hand, PyPy's escape analysis is very powerful. PyPy also has support for the annotations that SPUR supports, using some decorators in the pypy.rlib.jit module. User-controlled loop unrolling is performed using the unroll_safe decorator, tracing of a function can be disabled with the dont_look_inside decorator.

PyPy has a few more annotations that were not mentioned in the SPUR tech report. Most importantly, it is possible to declare a function as pure, using the purefunction decorator. PyPy's optimizers will remove calls to a function decorated that way if the arguments to the call are all constant. In addition it is possible to declare instances of classes to be immutable, which means that field accesses on constant instances can be folded away. Furthermore there is the promote hint, which is spelled x = hint(x, promote=True). This will produce a guard in the trace, to turn x into a constant after the guard.

Summary

Given the similarity between the projects' goals, it is perhaps not so surprising to see that PyPy and SPUR have co-evolved and reached many similar design decisions. It is still very good to see another project that does many things in the same way as PyPy.

2 comments:

Anonymous said...

Besides being similar projects, is it possible to cross the streams? Could PyPy's CLI backend take the place of the JavaScript-to-CIL compiler (or is that the wrong parallel)?

Carl Friedrich Bolz-Tereick said...

@Anonymous: I guess you could program stuff in RPython, compile to CIL and get it jitted by SPUR. However, this wouldn't work well for our main RPython programs, which are all interpreters. Using a tracing JIT on an interpreter without doing anything special is not helping much.