This is part 2 of a series on how to speed up an interpreter written with PyPy
by adding JIT hints to the interpreter. Part 1 described how to control the
extent of tracing. In this post I will describe how to add hints that
influence the optimizer. If applied correctly these techniques can give
really big speedups by pre-computing parts of what happens at runtime. On the other
hand, if applied incorrectly they might lead to code bloat, thus making the
resulting program actually slower.
Background
Before sending the trace to the backend to produce actual machine code, it is
optimized. The optimizer applies a number of techniques to remove or reduce
the number of operations: most of these are well known compiler optimization
techniques, with the difference that it is easier to apply them in a tracing
JIT because it only has to deal with linear traces. Among the techniques:
In some places it turns out that if the interpreter author rewrites some parts
of the interpreter with these optimizations in mind the traces that are produced
by the optimizer can be vastly improved.
In this post I will describe two hints that allow the interpreter author to
increase the optimization opportunities for constant folding. For constant
folding to work, two conditions need
to be met:
- the arguments of an operation actually need to all be constant,
i.e. statically known by the optimizer
- the operation needs to be pure, i.e. always yield the same result given
the same arguments.
The PyPy JIT generator automatically detects the majority of these conditions.
However, for the cases in which the automatic detection does not work, the
interpreter author can apply hints to improve the optimization
opportunities. There is one kind of hint for both of the conditions above.
Note: These hints are written by an interpreter developer and applied to the
RPython source of the interpreter. Normal Python users will never see them.
Where Do All the Constants Come From
It is worth clarifying what is a "constant" in this context. A variable of
the trace is said to be constant if its value is statically known by the
optimizer.
The simplest example of constants are literal values. For example, if in the
RPython source code we have a line like y = x + 1, the second operand will
be a constant in the trace.
However, the optimizer can statically know the value of a variable even if it
is not a constant in the original source code. For example, consider the
following fragment of RPython code:
If the fragment is traced with x being 4, the following trace is
produced:
guard(x == 4)
y = y + x
In the trace above, the value of x is statically known thanks to the
guard. Remember that a guard is a runtime check. The above trace will run to
completion when x == 4. If the check fails, execution of the trace is
stopped and the interpreter continues to run.
There are cases in which it is useful to turn an arbitrary variable
into a constant value. This process is called promotion and it is an old idea
in partial evaluation (it's called "the trick" there). Promotion is also heavily
used by Psyco and by all older versions of PyPy's JIT. Promotion is a technique
that only works well in JIT compilers, in
static compilers it is significantly less applicable.
Promotion is essentially a tool for trace specialization. In some places in the
interpreter it would be very useful if a variable were constant, even though it
could have different values in practice. In such a place, promotion is used. The
typical reason to do that is if there is
a lot of computation depending on the value of that variable.
Let's make this more concrete. If we trace a call to the following function:
def f1(x, y):
z = x * 2 + 1
return z + y
We get a trace that looks like this:
v1 = x * 2
z = v1 + 1
v2 = z + y
return(v2)
Observe how the first two operations could be constant-folded if the value of
x were known. Let's assume that the value of x can vary, but does so
rarely, i.e. only takes a few different values at runtime. If this is the
case, we can add a hint to promote x, like this:
def f2(x, y):
x = hint(x, promote=True)
z = x * 2 + 1
return z + y
The meaning of this hint is that the tracer should pretend that x is a
constant
in the code that follows. When just running the code, the function has no
effect, as it simply returns its first argument. When tracing, some extra work
is done. Let's assume that this changed function is traced with
the arguments 4 and 8. The trace will be the same, except for one
operation at the beginning:
guard(x == 4)
v1 = x * 2
z = v1 + 1
v2 = z + y
return(v2)
The promotion is turned into a guard operation in the trace. The guard
captures the value of x as it was at runtime. From the point of view of the
optimizer, this guard is not any different than the one produced by the if
statement in the example above. After the guard, the rest of the trace can
assume that x is equal to 4, meaning that the optimizer will turn this
trace into:
guard(x == 4)
v2 = 9 + y
return(v2)
Notice how the first two arithmetic operations were constant folded. The hope is
that the guard is executed quicker than the multiplication and the addition that
was now optimized away.
If this trace is executed with values of x other than 4, the guard will
fail, and execution will continue in the interpreter. If the guard fails often
enough, a new trace will be started from the guard. This other trace will
capture a different value of x. If it is e.g. 2, then the optimized
trace looks like this:
guard(x == 2)
v2 = 5 + y
return(v2)
This new trace will be attached to the guard instruction of the first trace. If
x takes on even more values, a new trace will eventually be made for all of them,
linking them into a chain. This is clearly not desirable, so we should promote
only variables that don't vary much. However, adding a promotion hint will never produce wrong
results. It might just lead to too much assembler code.
Promoting integers, as in the examples above, is not used that often.
However, the internals of dynamic language interpreters often
have values that are variable but vary little in the context of parts of a user
program. An example would be the types of variables in a user function. Even
though in principle the argument to a Python function could be any Python type,
in practise the argument types tend to not vary much. Therefore it is possible to
promote the types. In the next blog post I will give a complete example for how
this works.
Declaring New Pure Operations
In the last section we saw a way to turn arbitrary variables into constants. All
pure operations on these constants can be constant-folded. This works great for
constant folding of simple types, e.g. integers. Unfortunately, in the context of an
interpreter for a dynamic
language, most operations actually manipulate objects, not simple types. The
operations on objects are often not pure and might even have side-effects. If
one reads a field out of a constant reference to an object this cannot
necessarily be folded away because the object can be mutated. Therefore, another
hint is needed.
As an example, take the following class:
class A(object):
def __init__(self, x, y):
self.x = x
self.y = y
def f(self, val):
self.y = self.compute() + val
def compute(self):
return self.x * 2 + 1
Tracing the call a.f(10) of some instance of A yields the following
trace (note how the call to compute is inlined):
x = a.x
v1 = x * 2
v2 = v1 + 1
v3 = v2 + val
a.y = v3
In this case, adding a promote of self in the f method to get rid of the
computation of the first few operations does not help. Even if a is a
constant reference to an object, reading the x field does not necessarily
always yield the same value. To solve this problem, there is another annotation,
which lets the interpreter author communicate invariants to the optimizer. In
this case, she could decide that the x field of instances of A is
immutable, and therefore compute
is a pure function. To communicate this, there is a purefunction decorator.
If the code in compute should be constant-folded away, we would change the
class as follows:
class A(object):
def __init__(self, x, y):
self.x = x
self.y = y
def f(self, val):
self = hint(self, promote=True)
self.y = self.compute() + val
@purefunction
def compute(self):
return self.x * 2 + 1
Now the trace will look like this:
guard(a == 0xb73984a8)
v1 = compute(a)
v2 = v1 + val
a.y = v2
Here, 0xb73984a8 is the address of the instance of A that was used
during tracing. The call to compute is not inlined, so that the optimizer
has a chance to see it. Since compute function is marked as pure, and its
argument
is a constant reference, the call will be removed by the optimizer. The final
trace looks like this:
guard(a == 0xb73984a8)
v2 = 9 + val
a.y = v2
(assuming that the x field's value is 4).
On the one hand, the purefunction annotation is very powerful. It can be
used to constant-fold arbitrary parts of the computation in the interpreter.
However, the annotation also gives you ample opportunity to mess things up. If a
function is annotated to be pure, but is not really, the optimizer can produce
subtly wrong code. Therefore, a lot of care has to be taken when using this
annotation.
Observably Pure Functions
Why can't we simply write an analysis to find out that the x fields of the
A instances is immutable and deduce that compute is a pure function,
since it only reads the x field and does not have side effects? This might
be possible in this particular case, but in practice the functions that are
annotate with the purefunction decorator are usually more complex.
The easiest example for this is that of a function that uses memoization to
cache its results. If you analyze this function, it looks like the function has
side effects, because it changes the memoizing dictionary. However, because this side
effect is not externally visible, the function from the outside is pure. This is
a property that is not easily detectable by analysis. Therefore, the purity
of this function needs to be annotated.
Immutable Fields
One of the most common cases of pure functions is reading immutable
values out of objects. Since this is so common, we have special syntactic sugar
for it. A RPython class can have a class attribute _immutable_fields_ set to
a list of strings, listing the fields that cannot be changed. This is equivalent
to using getters and annotating them with purefunction.
Conclusion
In this blog post I explained two more hints that can be used in the source code
of the interpreter. They are used to influence what the optimizer does with the
trace. I realize the examples given here are a bit too small, in the next
installment I will give a worked-out example that puts all the pieces together.