Author: arigo
Date: Mon Mar 26 16:14:50 2007
New Revision: 41362
Modified:
pypy/dist/pypy/doc/jit.txt
pypy/dist/pypy/jit/tl/targettiny1.py
pypy/dist/pypy/jit/tl/targettiny2.py
pypy/dist/pypy/jit/tl/tiny2.py
Log:
Mostly finished jit.txt. Added long comments in tiny2.py.
Modified: pypy/dist/pypy/doc/jit.txt
==============================================================================
--- pypy/dist/pypy/doc/jit.txt (original)
+++ pypy/dist/pypy/doc/jit.txt Mon Mar 26 16:14:50 2007
@@ -323,10 +323,92 @@
examples.
-A (slightly less) tiny interpreter
-==================================
+A slightly less tiny interpreter
+================================
-`pypy/jit/tl/tiny2.py`_ XXX
+The interpreter in `pypy/jit/tl/tiny2.py`_ is a reasonably good example
+of the difficulties that we meet when scaling up this approach, and how
+we solve them - or work around them. For more details, see the comments
+in the source code. With more work on the JIT generator, we hope to be
+eventually able to remove the need for the workarounds.
+
+Promotion
+---------
+
+The most powerful hint introduced in this example is ``promote=True``.
+It is applied to a value that is usually not a compile-time constant,
+but which we would like to become a compile-time constant "just in
+time". Its meaning is to instruct the JIT compiler to stop compiling at
+this point, wait until the runtime actually reaches that point, grab the
+value that arrived here at runtime, and go on compiling with the value
+now considered as a compile-time constant. If the same point is reached
+at runtime several times with several different values, the compiler
+will produce one code path for each, with a switch in the generated
+code. This is a process that is never "finished": in general, new
+values can always show up later during runtime, causing more code paths
+to be compiled and the switch in the generated code to be extended.
+
+Promotion is the essential new feature introduced in PyPy when compared
+to existing partial evaluation techniques (it was actually first
+introduced in Psyco [JITSPEC]_, which is strictly speaking not a partial
+evaluator).
+
+Another way to understand the effect of promotion is to consider it as a
+complement to the ``concrete=True`` hint. The latter tells the
+hint-annotator that the value that arrives here is required to be a
+compile-time constant (i.e. green). In general, this is a very strong
+constraint, because it forces "backwards" a potentially large number of
+values to be green as well - all the values that this one depends on.
+In general, it does not work at all, because the value ultimately
+depends on an operation that cannot be constant-folded at all by the JIT
+compiler, e.g. because it depends on external input or reads from
+non-immutable memory.
+
+The ``promote=True`` hint can take an arbitrary red value and returns it
+as a green variable, so it can be used to bound the set of values that
+need to be forced to green. A common idiom is to put a
+``concrete=True`` hint at the precise point where a compile-time
+constant would be useful (e.g. on the value on which a complex switch
+dispatches), and then put a few ``promote=True`` hints to copy specific
+values into green variables *before* the ``concrete=True``.
+
+The ``promote=True`` hints should be applied where we expect not too
+many different values to arrive at runtime; here are typical examples:
+
+* Where we expect a small integer, the integer can be promoted if each
+ specialized version can be optimized (e.g. lists of known length can
+ be optimized by the JIT compiler).
+
+* The interpreter-level class of an object can be promoted before an
+ indirect method call, if it is useful for the JIT compiler to look
+ inside the called method. If the method call is indirect, the JIT
+ compiler merely produces a similar indirect method call in the
+ generated code. But if the class is a compile-time constant, then it
+ knows which method is called, and compiles its operations (effectively
+ inlining it from the point of the view of the generated code).
+
+* Whole objects can be occasionally promoted, with care. For example,
+ in an interpreter for a language which has function calls, it might be
+ useful to know exactly which Function object is called (as opposed to
+ just the fact that we call an object of class Function).
+
+Other hints
+-----------
+
+The other hints mentioned in `pypy/jit/tl/tiny2.py`_ are "global merge
+points" and "deepfreeze". For more information, please refer to the
+explanations there.
+
+We should also mention a technique not used in ``tiny2.py``, which is
+the notion of *virtualizable* objects. In PyPy, the Python frame
+objects are virtualizable. Such objects assume that they will be mostly
+read and mutated by the JIT'ed code - this is typical of frame objects
+in most interpreters: they are either not visible at all for the
+interpreted programs, or (as in Python) you have to access them using
+some reflection API. The ``_virtualizable_`` hint allows the object to
+escape (e.g. in PyPy, the Python frame object is pushed on the
+globally-accessible frame stack) while still remaining efficient to
+access from JIT'ed code.
------------------------------------------------------------------------
@@ -499,6 +581,8 @@
.. _`expanded version of the present document`: discussion/jit-draft.html
+---------------
+
.. _VMC: http://codespeak.net/svn/pypy/extradoc/talk/dls2006/pypy-vm-construction.pdf
.. _`RPython`: coding-guide.html#rpython
@@ -510,5 +594,9 @@
.. _Psyco: http://psyco.sourceforge.net
.. _`PyPy Standard Interpreter`: architecture.html#standard-interpreter
.. _`exception transformer`: translation.html#making-exception-handling-explicit
+.. [JITSPEC] Representation-Based Just-In-Time Specialization and the
+ Psyco Prototype for Python, ACM SIGPLAN PEPM'04, August 24-26, 2004,
+ Verona, Italy.
+ http://psyco.sourceforge.net/psyco-pepm-a.ps.gz
.. include:: _ref.txt
Modified: pypy/dist/pypy/jit/tl/targettiny1.py
==============================================================================
--- pypy/dist/pypy/jit/tl/targettiny1.py (original)
+++ pypy/dist/pypy/jit/tl/targettiny1.py Mon Mar 26 16:14:50 2007
@@ -3,6 +3,11 @@
def entry_point(args):
+ """Main entry point of the stand-alone executable:
+ takes a list of strings and returns the exit code.
+ """
+ # store args[0] in a place where the JIT log can find it (used by
+ # viewcode.py to know the executable whose symbols it should display)
highleveljitinfo.sys_executable = args[0]
if len(args) < 4:
print "Usage: %s bytecode x y" % (args[0],)
@@ -26,4 +31,8 @@
oopspec = True
def portal(driver):
+ """Return the 'portal' function, and the hint-annotator policy.
+ The portal is the function that gets patched with a call to the JIT
+ compiler.
+ """
return tiny1.ll_plus_minus, MyHintAnnotatorPolicy()
Modified: pypy/dist/pypy/jit/tl/targettiny2.py
==============================================================================
--- pypy/dist/pypy/jit/tl/targettiny2.py (original)
+++ pypy/dist/pypy/jit/tl/targettiny2.py Mon Mar 26 16:14:50 2007
@@ -3,15 +3,20 @@
def entry_point(args):
+ """Main entry point of the stand-alone executable:
+ takes a list of strings and returns the exit code.
+ """
+ # store args[0] in a place where the JIT log can find it (used by
+ # viewcode.py to know the executable whose symbols it should display)
highleveljitinfo.sys_executable = args[0]
- if len(args) < 3:
+ if len(args) < 2:
print "Invalid command line arguments."
print args[0] + " 'tiny2 program string' arg0 [arg1 [arg2 [...]]]"
return 1
bytecode = [s for s in args[1].split(' ') if s != '']
args = [tiny2.StrBox(arg) for arg in args[2:]]
res = tiny2.interpret(bytecode, args)
- print res.as_str()
+ print tiny2.repr(res)
return 0
def target(driver, args):
@@ -26,7 +31,12 @@
oopspec = True
def look_inside_graph(self, graph):
+ # temporary workaround
return getattr(graph, 'func', None) is not tiny2.myint_internal
def portal(driver):
+ """Return the 'portal' function, and the hint-annotator policy.
+ The portal is the function that gets patched with a call to the JIT
+ compiler.
+ """
return tiny2.interpret, MyHintAnnotatorPolicy()
Modified: pypy/dist/pypy/jit/tl/tiny2.py
==============================================================================
--- pypy/dist/pypy/jit/tl/tiny2.py (original)
+++ pypy/dist/pypy/jit/tl/tiny2.py Mon Mar 26 16:14:50 2007
@@ -1,7 +1,45 @@
+"""
+An interpreter for a strange word-based language: the program is a list
+of space-separated words. Most words push themselves on a stack; some
+words have another action. The result is the space-separated words
+from the stack.
+
+ Hello World => 'Hello World'
+ 6 7 ADD => '13' 'ADD' is a special word
+ 7 * 5 = 7 5 MUL => '7 * 5 = 35' '*' and '=' are not special words
+
+Arithmetic on non-integers gives a 'symbolic' result:
+
+ X 2 MUL => 'X*2'
+
+Input arguments can be passed on the command-line, and used as #1, #2, etc.:
+
+ #1 1 ADD => one more than the argument on the command-line,
+ or if it was not an integer, concatenates '+1'
+
+You can store back into an (existing) argument index with ->#N:
+
+ #1 5 ADD ->#1
+
+Braces { } delimitate a loop. Don't forget spaces around each one.
+The '}' pops an integer value off the stack and loops if it is not zero:
+
+ { #1 #1 1 SUB ->#1 #1 } => when called with 5, gives '5 4 3 2 1'
+
+"""
from pypy.rlib.objectmodel import hint, _is_early_constant
+#
+# See pypy/doc/jit.txt for a higher-level overview of the JIT techniques
+# detailed in the following comments.
+#
+
class Box:
+ # Although all words are in theory strings, we use two subclasses
+ # to represent the strings differently from the words known to be integers.
+ # This is an optimization that is essential for the JIT and merely
+ # useful for the basic interpreter.
pass
class IntBox(Box):
@@ -25,11 +63,17 @@
def func_sub_int(ix, iy): return ix - iy
def func_mul_int(ix, iy): return ix * iy
-def func_add_str(sx, sy): return sx + ' ' + sy
+def func_add_str(sx, sy): return sx + '+' + sy
def func_sub_str(sx, sy): return sx + '-' + sy
def func_mul_str(sx, sy): return sx + '*' + sy
def op2(stack, func_int, func_str):
+ # Operate on the top two stack items. The promotion hints force the
+ # class of each arguments (IntBox or StrBox) to turn into a compile-time
+ # constant if they weren't already. The effect we seek is to make the
+ # calls to as_int() direct calls at compile-time, instead of indirect
+ # ones. The JIT compiler cannot look into indirect calls, but it
+ # can analyze and inline the code in directly-called functions.
y = stack.pop()
hint(y.__class__, promote=True)
x = stack.pop()
@@ -42,9 +86,27 @@
def interpret(bytecode, args):
+ """The interpreter's entry point and portal function.
+ """
+ # ------------------------------
+ # First a lot of JIT hints...
+ #
+ # A portal needs a "global merge point" at the beginning, for
+ # technical reasons, if it uses promotion hints:
hint(None, global_merge_point=True)
+
+ # An important hint: 'bytecode' is a list, which is in theory
+ # mutable. Let's tell the JIT compiler that it can assume that the
+ # list is entirely frozen, i.e. immutable and only containing immutable
+ # objects. Otherwise, it cannot do anything - it would have to assume
+ # that the list can unpredictably change at runtime.
bytecode = hint(bytecode, deepfreeze=True)
- # ------------------------------
+
+ # Now some strange code that makes a copy of the 'args' list in
+ # a complicated way... this is a workaround forcing the whole 'args'
+ # list to be virtual. It is a way to tell the JIT compiler that it
+ # doesn't have to worry about the 'args' list being unpredictably
+ # modified.
oldargs = args
argcount = hint(len(oldargs), promote=True)
args = []
@@ -54,13 +116,21 @@
args.append(oldargs[n])
n += 1
# ------------------------------
+ # the real code starts here
loops = []
stack = []
pos = 0
while pos < len(bytecode):
+ # It is a good idea to put another 'global merge point' at the
+ # start of each iteration in the interpreter's main loop. The
+ # JIT compiler keeps a table of all the times it passed through
+ # the global merge point. It allows it to detect when it can
+ # stop compiling and generate a jump back to some machine code
+ # that was already generated earlier.
hint(None, global_merge_point=True)
+
opcode = bytecode[pos]
- hint(opcode, concrete=True)
+ hint(opcode, concrete=True) # same as in tiny1.py
pos += 1
if opcode == 'ADD': op2(stack, func_add_int, func_add_str)
elif opcode == 'SUB': op2(stack, func_sub_int, func_sub_str)
@@ -70,6 +140,8 @@
stack.append(args[n-1])
elif opcode.startswith('->#'):
n = myint(opcode, start=3)
+ if n > len(args):
+ raise IndexError
args[n-1] = stack.pop()
elif opcode == '{':
loops.append(pos)
@@ -78,14 +150,32 @@
loops.pop()
else:
pos = loops[-1]
+ # A common problem when interpreting loops or jumps: the 'pos'
+ # above is read out of a list, so the hint-annotator thinks
+ # it must be red (not a compile-time constant). But the
+ # hint(opcode, concrete=True) in the next iteration of the
+ # loop requires all variables the 'opcode' depends on to be
+ # green, including this 'pos'. We promote 'pos' to a green
+ # here, as early as possible. Note that in practice the 'pos'
+ # read out of the 'loops' list will be a compile-time constant
+ # because it was pushed as a compile-time constant by the '{'
+ # case above into 'loops', which is a virtual list, so the
+ # promotion below is just a way to make the colors match.
pos = hint(pos, promote=True)
else:
stack.append(StrBox(opcode))
- while len(stack) > 1:
- op2(stack, func_add_int, func_add_str)
- return stack.pop()
+ return stack
+
+def repr(stack):
+ # this bit moved out of the portal function because JIT'ing it is not
+ # very useful, and the JIT generator is confused by the 'for' right now...
+ return ' '.join([x.as_str() for x in stack])
+# ------------------------------
+# Pure workaround code! It will eventually be unnecessary.
+# For now, myint(s, n) is a JIT-friendly way to spell int(s[n:]).
+# We don't support negative numbers, though.
def myint_internal(s, start=0):
if start >= len(s):
return -1
@@ -98,7 +188,6 @@
res = res * 10 + n
start += 1
return res
-
def myint(s, start=0):
if _is_early_constant(s):
s = hint(s, promote=True)
@@ -111,18 +200,26 @@
if n < 0:
raise ValueError
return n
+# ------------------------------
def test_main():
main = """#1 5 ADD""".split()
res = interpret(main, [IntBox(20)])
- assert res.as_int() == 25
+ assert repr(res) == '25'
res = interpret(main, [StrBox('foo')])
- assert res.as_str() == 'foo 5'
+ assert repr(res) == 'foo+5'
FACTORIAL = """The factorial of #1 is
1 { #1 MUL #1 1 SUB ->#1 #1 }""".split()
def test_factorial():
res = interpret(FACTORIAL, [IntBox(5)])
- assert res.as_str() == 'The factorial of 5 is 120'
+ assert repr(res) == 'The factorial of 5 is 120'
+
+FIBONACCI = """Fibonacci numbers:
+ { #1 #2 #1 #2 ADD ->#2 ->#1 #3 1 SUB ->#3 #3 }""".split()
+
+def test_fibonacci():
+ res = interpret(FIBONACCI, [IntBox(1), IntBox(1), IntBox(10)])
+ assert repr(res) == "Fibonacci numbers: 1 1 2 3 5 8 13 21 34 55"