A Simple Object Model

To implement a dynamic language efficiently, the operations on its objects need
to be fast. Most dynamic languages have object models that are made by using
dictionaries everywhere. Let's look at an example of how the JIT can be made to
optimize such operations.

For the purpose of this blog post we will use a very simple and bare-bones
object model that just supports very simple classes and instances, without any
inheritance or any fancy features. The model has classes, which contain methods.
Instances have a class. Instances have their own attributes. When looking up an
attribute on an instance, the instances attributes are searched. If the
attribute is not found there, the class' attributes are searched.

To implement this object model, we could use the following RPython code as part
of the interpreter source code:

In this straightforward implementation the methods and attributes are just
stored in dictionaries on the classes/instances. While this object model is very
simple it already contains all the hard parts of Python's object model. Both
instances and classes can have arbitrary fields, and they are changeable at
any time. Moreover, instances can change their class after they have been
created.

When using this object model in
an interpreter, a huge amount of time will be spent doing lookups in these
dictionaries. To make the language efficient using a tracing JIT, we need to
find a way to get rid of these dictionary lookups somehow.

Let's assume we trace through code that sums three attributes, such as:

In this example, the attribute a is found on the instance, but the
attributes b and c are found on the class. The trace indeed contains
five calls to dict.get, which is slow.

Making Instance Attributes Faster Using Maps

The first step in making getattr faster in our object model is to optimize
away the dictionary lookups on the instances. The hints we have looked at in the
two earlier blog posts don't seem to help with the current object model. There is
no pure function to be seen, and the instance is not a candidate for promotion,
because there tend to be many instances.

This is a common problem when trying to apply hints. Often, the interpreter
needs a small rewrite to expose the pure functions and nearly-constant objects
that are implicitly there. In the case of instance fields this rewrite is not
entirely obvious. The basic idea is as follows. In theory instances can have
arbitrary fields. In practice however many instances share their layout (i.e.
their set of keys) with many other instances.

Therefore it makes sense to factor the layout information out of the instance
implementation into a shared object. This shared layout object is called a
map. Maps are an old idea that comes originally from the SELF language. They are
also used by many JavaScript implementations such as V8. I've written about maps
before, so I won't explain them fully again.

Instances no longer use dictionaries to store their fields. Instead, they have a
reference to a map, which maps field names to indexes into a storage list. The
storage list contains the actual field values. The maps are shared between
objects with the same layout. Therefore they have to be immutable, which means
that their getindex method is a pure function. When a new attribute is added
to an instance, a new map needs to be chosen, which is done with the
new_map_with_additional_attribute method on the previous map. Now that we have
introduced maps, it is safe to promote the map everywhere, because we assume
that the number of different instance layouts is small.

With this changed instance implementation, the trace we had above changes to the
following, where 0xb74af4a8 is the memory address of the Map instance that
has been promoted:

The calls to Map.getindex can be optimized away, because they are calls to
a pure function and they have constant arguments. That means that index1/2/3
are constant and the guards on them can be removed. All but the first guard on
the map will be optimized away too, because the map cannot have changed in
between. The optimized trace looks like this:

The index 0 that is used to read out of the storage array is the result
of the constant-folded getindex call. This trace is already much better than
the original one. Now we are down from five dictionary lookups to just two.

Versioning of Classes

Instances were optimized making the assumption that the total number of
Instance layouts is small compared to the number of instances. For classes we
will make an even stronger assumption. We simply assume that it is rare for
classes to change at all. This is not totally reasonable (sometimes classes contain
counters or similar things) but for this simple example it is good enough.

What we would really like is if the Class.find_method method were pure.
But it cannot be, because it is always possible to change the class itself.
Every time the class changes, find_method can potentially return a
new value.

Therefore, we give every class a version number, which is increased every time a
class gets changed (i.e., the content of the methods dictionary changes).
This means that the result of methods.get() for a given (name,
version) pair will always be the same, i.e. it is a pure operation. To help
the JIT to detect this case, we factor it out in a helper method which is
explicitly marked as @purefunction. The refactored Class looks like
this:

What is interesting here is that _find_method takes the version
argument but it does not use it at all. Its only purpose is to make the call
pure (because when the version number changes, the result of the call might be
different than the previous one).

The constants 41 and 17 are the results of the folding of the
_find_method` calls. This final trace is now very good. It no longer performs any
dictionary lookups. Instead it contains several guards. The first guard
checks that the map is still the same. This guard will fail if the same
code is executed with an instance that has another layout. The second guard
checks that the class of inst is still the same. It will fail if trace is
executed with an instance of another class. The third guard checks that the
class did not change since the trace was produced. It will fail if somebody
calls the change_method method on the class.

Real-World Considerations

The techniques used above for the simple object model are used for the object
model of PyPy's Python interpreter too. Since Python's object model is
considerably more complex, some additional work needs to be done.

The first problem that needs to be solved is that Python supports (multiple)
inheritance. Therefore looking up a method in a class needs to consider the
whole method resolution order. This makes the versioning of classes more
complex. If a class is changed its version changes. At the same time, the
versions of all the classes inheriting from it need to be changed as well,
recursively. This makes class changes expensive, but they should be rare. On the
other hand, a method lookup in a complex class hierarchy is as optimized in the
trace as in our object model here.

A downside of the versioning of classes that we haven't yet fixed in PyPy, is
that some classes do change a lot. An example would be a class that keeps a
counter of how many instances have been created so far. This is very slow right
now, but we have ideas about how to fix it in the future.

Another optimization is that in practice the shape of an instance is correlated
with its class. In our code above, we allow both to vary independently.
In PyPy's Python interpreter we act somewhat more cleverly. The class of
an instance is not stored on the instance itself, but on the map. This means
that we get one fewer promotion (and thus one fewer guard) in the trace, because the class doesn't need to
be promoted after the map has been.

More General Patterns

The techniques we used above to make instance and class lookups faster are
applicable in more general cases than the one we developed them for. A more
abstract view of maps is that of splitting a data-structure into a part that
changes slowly, and a part that changes quickly. In the concrete example of maps
we split the original dictionary into the map (the slow-changing part) and the
storage array (the quick-changing part). All the computation on the
slow-changing part can be constant-folded during tracing so that only the
manipulation of the quick-changing part remains.

Similarly, versions can be used to constant-fold arbitrary functions of large data
structures. The version needs to be updated carefully every time the result of
this function can change. Therefore this is useful only if the data structure is
expected to change slowly.

Conclusion

In this post I showed how to use purefunction and promote to make a
small but still relevant dynamic object model no longer use any dictionary lookups
after tracing. Instead a number of guards are inserted into the
trace to check whether the assumptions about the objects are still true. This
makes operations on objects seriously faster. I plan to write another small post
that shows the speed benefits for PyPy's Python interpreter for exactly these
operations.

A Simple Object Model

To implement a dynamic language efficiently, the operations on its objects need
to be fast. Most dynamic languages have object models that are made by using
dictionaries everywhere. Let's look at an example of how the JIT can be made to
optimize such operations.

For the purpose of this blog post we will use a very simple and bare-bones
object model that just supports very simple classes and instances, without any
inheritance or any fancy features. The model has classes, which contain methods.
Instances have a class. Instances have their own attributes. When looking up an
attribute on an instance, the instances attributes are searched. If the
attribute is not found there, the class' attributes are searched.

To implement this object model, we could use the following RPython code as part
of the interpreter source code:

In this straightforward implementation the methods and attributes are just
stored in dictionaries on the classes/instances. While this object model is very
simple it already contains all the hard parts of Python's object model. Both
instances and classes can have arbitrary fields, and they are changeable at
any time. Moreover, instances can change their class after they have been
created.

When using this object model in
an interpreter, a huge amount of time will be spent doing lookups in these
dictionaries. To make the language efficient using a tracing JIT, we need to
find a way to get rid of these dictionary lookups somehow.

Let's assume we trace through code that sums three attributes, such as:

In this example, the attribute a is found on the instance, but the
attributes b and c are found on the class. The trace indeed contains
five calls to dict.get, which is slow.

Making Instance Attributes Faster Using Maps

The first step in making getattr faster in our object model is to optimize
away the dictionary lookups on the instances. The hints we have looked at in the
two earlier blog posts don't seem to help with the current object model. There is
no pure function to be seen, and the instance is not a candidate for promotion,
because there tend to be many instances.

This is a common problem when trying to apply hints. Often, the interpreter
needs a small rewrite to expose the pure functions and nearly-constant objects
that are implicitly there. In the case of instance fields this rewrite is not
entirely obvious. The basic idea is as follows. In theory instances can have
arbitrary fields. In practice however many instances share their layout (i.e.
their set of keys) with many other instances.

Therefore it makes sense to factor the layout information out of the instance
implementation into a shared object. This shared layout object is called a
map. Maps are an old idea that comes originally from the SELF language. They are
also used by many JavaScript implementations such as V8. I've written about maps
before, so I won't explain them fully again.

Instances no longer use dictionaries to store their fields. Instead, they have a
reference to a map, which maps field names to indexes into a storage list. The
storage list contains the actual field values. The maps are shared between
objects with the same layout. Therefore they have to be immutable, which means
that their getindex method is a pure function. When a new attribute is added
to an instance, a new map needs to be chosen, which is done with the
new_map_with_additional_attribute method on the previous map. Now that we have
introduced maps, it is safe to promote the map everywhere, because we assume
that the number of different instance layouts is small.

With this changed instance implementation, the trace we had above changes to the
following, where 0xb74af4a8 is the memory address of the Map instance that
has been promoted:

The calls to Map.getindex can be optimized away, because they are calls to
a pure function and they have constant arguments. That means that index1/2/3
are constant and the guards on them can be removed. All but the first guard on
the map will be optimized away too, because the map cannot have changed in
between. The optimized trace looks like this:

The index 0 that is used to read out of the storage array is the result
of the constant-folded getindex call. This trace is already much better than
the original one. Now we are down from five dictionary lookups to just two.

Versioning of Classes

Instances were optimized making the assumption that the total number of
Instance layouts is small compared to the number of instances. For classes we
will make an even stronger assumption. We simply assume that it is rare for
classes to change at all. This is not totally reasonable (sometimes classes contain
counters or similar things) but for this simple example it is good enough.

What we would really like is if the Class.find_method method were pure.
But it cannot be, because it is always possible to change the class itself.
Every time the class changes, find_method can potentially return a
new value.

Therefore, we give every class a version number, which is increased every time a
class gets changed (i.e., the content of the methods dictionary changes).
This means that the result of methods.get() for a given (name,
version) pair will always be the same, i.e. it is a pure operation. To help
the JIT to detect this case, we factor it out in a helper method which is
explicitly marked as @purefunction. The refactored Class looks like
this:

What is interesting here is that _find_method takes the version
argument but it does not use it at all. Its only purpose is to make the call
pure (because when the version number changes, the result of the call might be
different than the previous one).

The constants 41 and 17 are the results of the folding of the
_find_method` calls. This final trace is now very good. It no longer performs any
dictionary lookups. Instead it contains several guards. The first guard
checks that the map is still the same. This guard will fail if the same
code is executed with an instance that has another layout. The second guard
checks that the class of inst is still the same. It will fail if trace is
executed with an instance of another class. The third guard checks that the
class did not change since the trace was produced. It will fail if somebody
calls the change_method method on the class.

Real-World Considerations

The techniques used above for the simple object model are used for the object
model of PyPy's Python interpreter too. Since Python's object model is
considerably more complex, some additional work needs to be done.

The first problem that needs to be solved is that Python supports (multiple)
inheritance. Therefore looking up a method in a class needs to consider the
whole method resolution order. This makes the versioning of classes more
complex. If a class is changed its version changes. At the same time, the
versions of all the classes inheriting from it need to be changed as well,
recursively. This makes class changes expensive, but they should be rare. On the
other hand, a method lookup in a complex class hierarchy is as optimized in the
trace as in our object model here.

A downside of the versioning of classes that we haven't yet fixed in PyPy, is
that some classes do change a lot. An example would be a class that keeps a
counter of how many instances have been created so far. This is very slow right
now, but we have ideas about how to fix it in the future.

Another optimization is that in practice the shape of an instance is correlated
with its class. In our code above, we allow both to vary independently.
In PyPy's Python interpreter we act somewhat more cleverly. The class of
an instance is not stored on the instance itself, but on the map. This means
that we get one fewer promotion (and thus one fewer guard) in the trace, because the class doesn't need to
be promoted after the map has been.

More General Patterns

The techniques we used above to make instance and class lookups faster are
applicable in more general cases than the one we developed them for. A more
abstract view of maps is that of splitting a data-structure into a part that
changes slowly, and a part that changes quickly. In the concrete example of maps
we split the original dictionary into the map (the slow-changing part) and the
storage array (the quick-changing part). All the computation on the
slow-changing part can be constant-folded during tracing so that only the
manipulation of the quick-changing part remains.

Similarly, versions can be used to constant-fold arbitrary functions of large data
structures. The version needs to be updated carefully every time the result of
this function can change. Therefore this is useful only if the data structure is
expected to change slowly.

Conclusion

In this post I showed how to use purefunction and promote to make a
small but still relevant dynamic object model no longer use any dictionary lookups
after tracing. Instead a number of guards are inserted into the
trace to check whether the assumptions about the objects are still true. This
makes operations on objects seriously faster. I plan to write another small post
that shows the speed benefits for PyPy's Python interpreter for exactly these
operations.

Controlling the Tracing of an Interpreter With Hints, Part 3: Putting it All Together

7 comments:

Very clever indeed.I think and additional speedup can be achievedby using a technique from smalltalk intrepters: Method lookup cache.The cache is organized so that functioncache(class, method) returns a pointer to the method.The early Smalltalk implementors reported pretty spectacular speedups when this cache was implemented.

Hmm. And now I think I know why twisted isn't any faster in pypy. I remember looking at the source a few years ago and being horrified to see that they were changing class methods during runtime. I guessed to avoid one layer of dispatch in state machines. Anyway, it's an "optimisation" that will hurt pypy.