In the last months, I've actively worked on the CLI backend for PyPy's
JIT generator, whose goal is to automatically generate JIT compilers
that produces .NET bytecode on the fly.
The CLI JIT backend is far from be completed and there is still a lot
of work to be done before it can handle the full PyPy's Python
interpreter; nevertheless, yesterday I finally got the first .NET
executable that contains a JIT for a very simple toy language called
tlr, which implements an interpreter for a minimal register based
virtual machine with only 8 operations.
To compile the tlr VM, follow these steps:
get a fresh checkout of the oo-jit branch, i.e. the branch
where the CLI JIT development goes on:
$ svn co http://codespeak.net/svn/pypy/branch/oo-jit
go to the oo-jit/pypy/jit/tl directory, and compile the tlr VM
with the CLI backend and JIT enabled:
$ cd oo-jit/pypy/jit/tl/
$ ../../translator/goal/translate.py -b cli --jit --batch targettlr
The goal of our test program is to compute the square of a given
number; since the only operations supported by the VM are addition and
negation, we compute the result by doing repetitive additions; I won't
describe the exact meaning of all the tlr bytecodes here, as they are
quite self-documenting:
ALLOCATE, 3, # make space for three registers
MOV_A_R, 0, # i = a
MOV_A_R, 1, # copy of 'a'
SET_A, 0,
MOV_A_R, 2, # res = 0
# 10:
SET_A, 1,
NEG_A,
ADD_R_TO_A, 0,
MOV_A_R, 0, # i--
MOV_R_A, 2,
ADD_R_TO_A, 1,
MOV_A_R, 2, # res += a
MOV_R_A, 0,
JUMP_IF_A, 10, # if i!=0: goto 10
MOV_R_A, 2,
RETURN_A # return res
You can find the program also at the end of the tlr module; to get an
assembled version of the bytecode, ready to be interpreted, run this
command:
$ python tlr.py assemble > square.tlr
Now, we are ready to execute the code through the tlr VM; if you are
using Linux/Mono, you can simply execute the targettlr-cli script
that has been created for you; however, if you use Windows, you have
to manually fish the executable inside the targettlr-cli-data
directory:
# Linux
$ ./targettlr-cli square.tlr 16
256
# Windows
> targettlr-cli-data\main.exe square.tlr 16
256
Cool, our program computed the result correctly! But, how can we be
sure that it really JIT compiled our code instead of interpreting it?
To inspect the code that it's generated by our JIT compiler, we simply
set the PYPYJITLOG environment variable to a filename, so that the
JIT will create a .NET assembly containing all the code that has been
generated by the JIT:
$ PYPYJITLOG=generated.dll ./targettlr-cli square.tlr 16
256
$ file generated.dll
generated.dll: MS-DOS executable PE for MS Windows (DLL) (console) Intel 80386 32-bit
Now, we can inspect the DLL with any IL disassembler, such as
ilasm or monodis; here is an excerpt of the disassembled code,
that shows how our square.tlr bytecode has been compiled to .NET
bytecode:
.method public static hidebysig default int32 invoke (object[] A_0, int32 A_1) cil managed
{
.maxstack 3
.locals init (int32 V_0, int32 V_1, int32 V_2, int32 V_3, int32 V_4, int32 V_5)
ldc.i4 -1
ldarg.1
add
stloc.1
ldc.i4 0
ldarg.1
add
stloc.2
IL_0010: ldloc.1
ldc.i4.0
cgt.un
stloc.3
ldloc.3
brfalse IL_003b
ldc.i4 -1
ldloc.1
add
stloc.s 4
ldloc.2
ldarg.1
add
stloc.s 5
ldloc.s 5
stloc.2
ldloc.s 4
stloc.1
ldarg.1
starg 1
nop
nop
br IL_0010
IL_003b: ldloc.2
stloc.0
br IL_0042
ldloc.0
ret
}
If you know a bit IL, you can see that the code generated is not
optimal, as there are some redundant operations like all those
stloc/ldloc pairs; however, while not optimal, it is still quite good
code, not much different to what you would get by writing the square
algorithm directly in e.g. C#.
As I said before, all of this is still work in progress and there is
still much to be done. Stay tuned :-).