Wednesday, May 28, 2008

Progresses on the CLI JIT backend front

In the last months, I've actively worked on the CLI backend for PyPy's JIT generator, whose goal is to automatically generate JIT compilers that produces .NET bytecode on the fly.

The CLI JIT backend is far from be completed and there is still a lot of work to be done before it can handle the full PyPy's Python interpreter; nevertheless, yesterday I finally got the first .NET executable that contains a JIT for a very simple toy language called tlr, which implements an interpreter for a minimal register based virtual machine with only 8 operations.

To compile the tlr VM, follow these steps:

  1. get a fresh checkout of the oo-jit branch, i.e. the branch where the CLI JIT development goes on:

    $ svn co http://codespeak.net/svn/pypy/branch/oo-jit
    
  2. go to the oo-jit/pypy/jit/tl directory, and compile the tlr VM with the CLI backend and JIT enabled:

    $ cd oo-jit/pypy/jit/tl/
    $ ../../translator/goal/translate.py -b cli --jit --batch targettlr
    

The goal of our test program is to compute the square of a given number; since the only operations supported by the VM are addition and negation, we compute the result by doing repetitive additions; I won't describe the exact meaning of all the tlr bytecodes here, as they are quite self-documenting:

ALLOCATE,    3,   # make space for three registers
MOV_A_R,     0,   # i = a
MOV_A_R,     1,   # copy of 'a'

SET_A,       0,
MOV_A_R,     2,   # res = 0

# 10:
SET_A,       1,
NEG_A,
ADD_R_TO_A,  0,
MOV_A_R,     0,   # i--

MOV_R_A,     2,
ADD_R_TO_A,  1,
MOV_A_R,     2,   # res += a

MOV_R_A,     0,
JUMP_IF_A,  10,   # if i!=0: goto 10

MOV_R_A,     2,
RETURN_A          # return res

You can find the program also at the end of the tlr module; to get an assembled version of the bytecode, ready to be interpreted, run this command:

$ python tlr.py assemble > square.tlr

Now, we are ready to execute the code through the tlr VM; if you are using Linux/Mono, you can simply execute the targettlr-cli script that has been created for you; however, if you use Windows, you have to manually fish the executable inside the targettlr-cli-data directory:

# Linux
$ ./targettlr-cli square.tlr 16
256

# Windows
> targettlr-cli-data\main.exe square.tlr 16
256

Cool, our program computed the result correctly! But, how can we be sure that it really JIT compiled our code instead of interpreting it? To inspect the code that it's generated by our JIT compiler, we simply set the PYPYJITLOG environment variable to a filename, so that the JIT will create a .NET assembly containing all the code that has been generated by the JIT:

$ PYPYJITLOG=generated.dll ./targettlr-cli square.tlr 16
256
$ file generated.dll
generated.dll: MS-DOS executable PE  for MS Windows (DLL) (console) Intel 80386 32-bit

Now, we can inspect the DLL with any IL disassembler, such as ilasm or monodis; here is an excerpt of the disassembled code, that shows how our square.tlr bytecode has been compiled to .NET bytecode:

.method public static  hidebysig default int32 invoke (object[] A_0, int32 A_1)  cil managed
{
    .maxstack 3
    .locals init (int32 V_0, int32 V_1, int32 V_2, int32 V_3, int32 V_4, int32 V_5)

    ldc.i4 -1
    ldarg.1
    add
    stloc.1
    ldc.i4 0
    ldarg.1
    add
    stloc.2
    IL_0010:  ldloc.1
    ldc.i4.0
    cgt.un
    stloc.3
    ldloc.3
    brfalse IL_003b

    ldc.i4 -1
    ldloc.1
    add
    stloc.s 4
    ldloc.2
    ldarg.1
    add
    stloc.s 5
    ldloc.s 5
    stloc.2
    ldloc.s 4
    stloc.1
    ldarg.1
    starg 1

    nop
    nop
    br IL_0010

    IL_003b:  ldloc.2
    stloc.0
    br IL_0042

    ldloc.0
    ret
}

If you know a bit IL, you can see that the code generated is not optimal, as there are some redundant operations like all those stloc/ldloc pairs; however, while not optimal, it is still quite good code, not much different to what you would get by writing the square algorithm directly in e.g. C#.

As I said before, all of this is still work in progress and there is still much to be done. Stay tuned :-).

3 comments:

Lucian said...

So the mono JIT would pick up that bytecode and further compile it to native code?

Also, what would be needed for doing the same thing for the JVM?

Antonio Cuni said...

Yes, that's exactly the idea; in fact, the program run by virtual machines generated this way are double jit-ed.

Doing the same for the JVM won't be too hard, since most of the work we've done can be shared between the two JIT backends; unfortunately, at the moment the JVM backend is not as advanced as the CLI one, so before working on the JIT we would need more work on it. But indeed, having a JIT backend for the JVM is in our plans.

Lucian said...

Great. Can't wait for advanced piggybacking :)