PyPy Status Blog: A Simple Tracer for the Flow Graph Language

Tuesday, January 31, 2012

A Simple Tracer for the Flow Graph Language

Part 2 of Comparing Partial Evaluation to Tracing

This is the second blog post in a series about comparing partial evaluation and tracing. In the first post of the series I introduced a small flow-graph language together with an interpreter for it. Then I showed a partial evaluator for the language. In this post I will show how a tracer for the same language works and how it relates to both execution and to partial evaluation. The code from this post can be found here: http://paste.pocoo.org/show/543542/

Tracing Execution

The idea of a tracer (for the described language and also in general) is to do completely normal interpretation but at the same time keep a log of all the normal operations (i.e. non-control-flow operations) that were performed. This continues until the tracer executes the code block where it started at, in which case the trace corresponds to a closed loop. Then tracing stops and the last operation is replaced by a jump to the start. After tracing has ended, the trace can be executed, optionally optimizing it before that.

To write a tracer, we start from the rules of the interpreter, rename the predicate to trace and add some extra arguments. Thus, the following rules in the interpreter:

interp(op1(ResultVar, Op, Arg, Rest), Env) :-
    resolve(Arg, Env, RArg),
    do_op(Op, RArg, Res),
    write_env(Env, ResultVar, Res, NEnv),
    interp(Rest, NEnv).

interp(op2(ResultVar, Op, Arg1, Arg2, Rest), Env) :-
    resolve(Arg1, Env, RArg1),
    resolve(Arg2, Env, RArg2),
    do_op(Op, RArg1, RArg2, Res),
    write_env(Env, ResultVar, Res, NEnv),
    interp(Rest, NEnv).

become the following rules in the tracer:

trace(op1(ResultVar, Op, Arg, Rest), Env, op1(ResultVar, Op, Arg, T), TraceAnchor) :-
    resolve(Arg, Env, RArg),
    do_op(Op, RArg, Res),
    write_env(Env, ResultVar, Res, NEnv),
    trace(Rest, NEnv, T, TraceAnchor).

trace(op2(ResultVar, Op, Arg1, Arg2, Rest), Env, op2(ResultVar, Op, Arg1, Arg2, T), TraceAnchor) :-
    resolve(Arg1, Env, RArg1),
    resolve(Arg2, Env, RArg2),
    do_op(Op, RArg1, RArg2, Res),
    write_env(Env, ResultVar, Res, NEnv),
    trace(Rest, NEnv, T, TraceAnchor).

Note how the bodies of the trace rules correspond exactly to the bodies of the interp rules, the only difference is the recursive call to trace. The meaning of the arguments of trace is as follows: The first and second argument are the operation currently executed and the environment, like in the interpreter. The argument after that is an output argument that collects the currently traced operation, in the example above it is exactly like the operation that was executed. TraceAnchor is additional information about the trace that is being built right now, most of the time it is just handed on to the recursive call of trace. We will see later what it contains.

The rule for print_and_stop is very simple, as execution (and therefore also tracing) simply stops there:

trace(print_and_stop(V), Env, print_and_stop(V), _) :-
    resolve(V, Env, Val),
    print(Val), nl.

Left are the rules for the control operations jump and if. A trace linearizes one execution path, it contains no jumps. However, when a jump to the starting label is reached, tracing should stop. Therefore, the implementation of jump contains two cases:

trace(jump(L), Env, T, TraceAnchor) :-
    (TraceAnchor = traceanchor(L, FullTrace) ->
        T = loop,
        write(trace), nl, write(FullTrace), nl,
        do_optimize(FullTrace, OptTrace),
        write(opttrace), nl, write(OptTrace), nl,
        runtrace(OptTrace, Env, OptTrace)
    ;
        block(L, Block),
        trace(Block, Env, T, TraceAnchor)
    ).

Let's disect this code in small increments. First, we see what TraceAnchor is. It is a term of the form traceanchor(StartLabel, FullTrace). StartLabel is a label in the program where tracing started (and where it should end as well, when the loop is closed). The argument FullTrace is an accumulator which contains the full trace that is being built right now.

The condition at the start of the rule checks whether the taget-label L is the same as the one stored in the trace anchor. If that is the case, we can stop tracing. The rest of the trace T is assigned the operation loop, which jumps back to the beginning of the trace. Afterwards we print and optimize the trace, then run it, using the FullTrace part of the traceanchor.

If the label we jump to is not the StartLabel we simply continue tracing without recording any operation. This part of the rule is again extremely similar to the interpretation of jump.

For now, we will not use any interesting optimizations, just return the unoptimized trace unchanged:

do_optimize(FullTrace, FullTrace).

The missing operation now is if. An if statement needs special treatment, because it is a way where control flow can diverge from the trace. The trace is linear, therefore it can only record one of the two possible paths. When executing the trace it is possible for the other path to be taken. Therefore we need to make sure that the same conditions that were true or false during tracing are still true or false during the execution of the trace. This is done with a guard operation, which checks for this condition. The following rule implements it:

trace(if(V, L1, L2), Env, T, TraceAnchor) :-
    lookup(V, Env, Val),
    (Val == 0 ->
        L = L2, T = guard_false(V, [], L1, NT)
    ;
        L = L1, T = guard_true(V, [], L2, NT)
    ),
    trace(jump(L), Env, NT, TraceAnchor).

It is very similar to the interp rule of if. The rule inserts a guard_true into the case, if the condition is true, and a guard_false if the condition is false. The arguments of the guard are: The variable that is being guarded, an empty list (the reason for that will be explained in a later post), the label where execution needs to continue when the guard fails and the rest of the trace.

Let's also add a small helper predicate that can be used to conveniently start tracing:

do_trace(L, Env) :-
    block(L, StartBlock),
    trace(StartBlock, Env, ProducedTrace, traceanchor(L, ProducedTrace)).

The predicate takes a label and an environment and executes the label with the given environment by first producing a trace, then executing the trace and eventually jumping back to interpretation, if a guard fails. It does this by reading the code at label L with the block statement, and then calling trace with an unbound variable ProducedTrace to hold the trace and a trace anchor that contains the label where tracing started and the produced trace variable.

With that predicate and the trace so far we can already trace the power implementation from the last blog post, just not execute the trace (which we will do in the next section):

?- do_trace(power_rec, [res/1, x/10, y/20]).
trace
op2(res,mul,var(res),var(x),op2(y,sub,var(y),const(1),guard_true(y,[],power_done,loop)))
opttrace
op2(res,mul,var(res),var(x),op2(y,sub,var(y),const(1),guard_true(y,[],power_done,loop)))
...

The computed trace is:

op2(res,mul,var(res),var(x),
op2(y,sub,var(y),const(1),
guard_true(y,[],power_done,
loop)))

which is exactly the content of the loop from power_rec. Note how the if is turned into a guard_true which jumps to power_done if the guard fails.

A real tracing system would need a way for the tracer to get started, e.g. by doing profiling in an interpreter and starting the tracer for labels that are jumped to often. Also, traces for the same label are usually cached in some way. These details are left out in this simple model.

Executing Traces

In a real tracing system, the traces would be turned into machine code and executed by the CPU. In our small model, we will simply write another interpreter for them. This interpreter is very simple and looks again very similar to interp.

runtrace(op1(ResultVar, Op, Arg, Rest), Env, TraceFromStart) :-
    resolve(Arg, Env, RArg),
    do_op(Op, RArg, Res),
    write_env(Env, ResultVar, Res, NEnv),
    runtrace(Rest, NEnv, TraceFromStart).

runtrace(op2(ResultVar, Op, Arg1, Arg2, Rest), Env, TraceFromStart) :-
    resolve(Arg1, Env, RArg1),
    resolve(Arg2, Env, RArg2),
    do_op(Op, RArg1, RArg2, Res),
    write_env(Env, ResultVar, Res, NEnv),
    runtrace(Rest, NEnv, TraceFromStart).

These rules are completely equivalent to the interp rules for op1 and op2. runtrace needs an extra argument, TraceFromStart, which is always just handed over to the recursive call of runtrace.

When the end of the trace is reached and the loop statement is encountered, we simply start from the beginning:

runtrace(loop, Env, TraceFromStart) :-
    runtrace(TraceFromStart, Env, TraceFromStart).

The remaining question is what to do when encountering guards. In that case the guard condition needs to be checked. If the guard succeeds, executing the trace can continue. Otherwise the trace is aborted and the interpreter resumes execution:

runtrace(guard_true(V, ResumeVars, L, Rest), Env, TraceFromStart) :-
    lookup(V, Env, Val),
    (Val == 0 ->
        resume_interp(Env, ResumeVars, L)
    ;
        runtrace(Rest, Env, TraceFromStart)
    ).

runtrace(guard_false(V, ResumeVars, L, Rest), Env, TraceFromStart) :-
    lookup(V, Env, Val),
    (Val == 0 ->
        runtrace(Rest, Env, TraceFromStart)
    ;
        resume_interp(Env, ResumeVars, L)
    ).


resume_interp(Env, [], L) :-
    block(L, Block),
    interp(Block, Env).

Note how the execution is handed over to the interpreter at the label that was encoded as the third argument in the guard operation. What the ResumeVars are for we will see in a later post. For now we assume that it is always an empty list.

With this interpreter for traces we can now trace and then execute the example:

:- do_trace(power_rec, [res/1, x/10, y/20]).
trace
op2(res,mul,var(res),var(x),op2(y,sub,var(y),const(1),guard_true(y,[],power_done,loop)))
opttrace
op2(res,mul,var(res),var(x),op2(y,sub,var(y),const(1),guard_true(y,[],power_done,loop)))
100000000000000000000

Of course this is example is not very exciting, because the trace looks more or less exactly like the original code as well. There will be more exciting examples in a later post.

Extension: Promotion

As it is, the tracer does not actually add much to the interpreter. It linearizes control flow, but nothing deeply advanced happens. In this section I will add a crucial but simple to implement extension to the control flow language that allows the tracer to do more interesting things. This extension is called promotion.

Promotion is basically a hint that the programmer can add to her control flow graph program. A promotion is an operation promote(V, L) that takes a variable V and a label L. When the interpreter runs this statement, it simply jumps to the label L and ignores the variable:

interp(promote(_, L), Env) :-
    interp(jump(L), Env).

However, the tracer does something much more interesting. For the tracer, the promote statement is a hint that it would be very useful to know the value of V and that the rest of the trace should keep that value as a constant. Therefore, when the tracer encounters a promotion, it inserts a special kind of guard called guard_value

trace(promote(V, L), Env, guard_value(V, Val, [], L, T), TraceAnchor) :-
    lookup(V, Env, Val),
    trace(jump(L), Env, T, TraceAnchor).

The guard_value is an interesting operation, because it freezes the current value FVal of variable V into the trace. When the trace is executed, the guard checks that the current value of the variable and the frozen value are the same. If yes, execution continues, if not, the trace is aborted:

runtrace(guard_value(V, FVal, ResumeVars, L, Rest), Env, TraceFromStart) :-
    lookup(V, Env, Val),
    (Val == FVal ->
        runtrace(Rest, Env, TraceFromStart)
    ;
        resume_interp(Env, ResumeVars, L)
    ).

What can this operation be used for? It's a way to communicate to the tracer that variable V is not changing very often and that it is therefore useful to freeze the current value into the trace. This can be done even without knowing the value of V in advance.

Let's look at a (slightly contrived) example:

l:
    c = i >= 0
    if c goto b else goto l_done

l_done:
    print_and_stop(var(i))

b:
    promote(x, b2)

b2:
    x2 = x * 2
    x3 = x2 + 1
    i = i - x3
    goto l

Encoded in Prolog syntax:

block(l, op2(c, ge, var(i), const(0),
         if(c, b, l_done))).
block(l_done, print_and_stop(var(i))).

block(b, promote(x, b2)).
block(b2, op2(x2, mul, var(x), const(2),
          op2(x3, add, var(x2), const(1),
          op2(i, sub, var(i), var(x3),
          jump(l))))).

This is a simple loop that counts down in steps of x * 2 + 1, whatever x might be, until i >= 0 is no longer true. Assuming that x doesn't change often, it is worth to promote it to be able to constant-fold x * 2 + 1 to not have to redo it every iteration. This is done with the promotion of x (of course optimizing this loop with loop invariant code motion would work as well, because x doesn't actually change during the loop).

To trace this, we can run the following query:

?- do_trace(b, [i/100, x/5]).
trace
guard_value(x,5,[],b2,op2(x2,mul,var(x),const(2),op2(x3,add,var(x2),const(1),op2(i,sub,var(i),var(x3),op2(c,ge,var(i),const(0),guard_true(c,[],l_done,loop))))))
opttrace
guard_value(x,5,[],b2,op2(x2,mul,var(x),const(2),op2(x3,add,var(x2),const(1),op2(i,sub,var(i),var(x3),op2(c,ge,var(i),const(0),guard_true(c,[],l_done,loop))))))
-10

Writing the trace in a more readable way:

guard_value(x,3,[],b2,
op2(x2,mul,var(x),const(2),
op2(x3,add,var(x2),const(1),
op2(i,sub,var(i),var(x3),
op2(c,ge,var(i),const(0),
guard_true(c,[],l_done,
loop))))))

After the guard_value the operations performed on x could be constant-folded away, because the guard ensures that x is 5 before execution continues. To actually do the constant-folding we would need some optimization component that optimizes traces. This will be done in the next blog post.

In this section I mostly talked about how promotion is realized in the tracer, not what and how to use to use it for. Promotion is one of the most important ingredients that's responsible for the success of PyPy's tracing approach. How this works is discussed in detail in the paper "Runtime feedback in a meta-tracing JIT for efficient dynamic languages".

Conclusion

In this blog post we have seen a very minimalistic tracer and an interpreter for the produced traces. The tracer is very much like the original interpreter, it just also keeps track of which operations were executed, in addition to executing the program. Tracing stops when a loop is closed, then the trace can be optimized and run. Running a trace continues until a failing guard is hit. At that point, execution goes back to the normal interpreter (and stays there, in this very simple implementation).

I also presented an extension of tracing that makes it possible to add a hint called promote to the original program that tells the tracer to feed back a runtime value into the trace and freeze it there. This extension would be impossible to do in the partial evaluator from the last post, because partial evaluation is done strictly before run time, so if a variable isn't already known, its likely runtime value cannot be found out.

In the next post I will show how to optimize traces before executing them and how the optimizer for traces is related to partial evaluation.