Multi-Primitive Dispatch Autoboxing Obsession

Parrot has four types of
primitives: integers, floats, strings, and PMCs. Parrot also has four types of
registers: I, N (for Numeric), S, and P. (Parrot would have an Artisan
personality if we used F for float, but that's a bad pun for psychology
students.) Many people have explained PMCs elsewhere, but think of them as
"everything else that isn't just an integer, float, or a string." That's not
entirely true, for reasons that will become obvious in a few paragraphs, but
computer programming, unlike computer science, is all about finding the most
appropriate lies to tell.

Parrot's calling conventions govern how you pass parameters to and receive
values from Parrot functions. In particular, they dictate which registers you
use. For example, you may pass two integers into a division function and get a
float back. The two incoming registers may be I0 and
I1, and the outgoing register may be N0 (I don't
believe they are, but it's all documented, and if you write at the level of PIR
or a higher-level language, you don't have to worry about that). That's all
well and good for single dispatch.

Multi dispatch makes life much more complicated, and programming much easier.

Suppose you have a function called divide, which takes two
arguments and returns one value. If you're used to Perl 5, you know that the
division operator is already polymorphic; you can give it two SVs and get back
one SV and treat the arguments as numeric or integral or however they are.
Imagine that you have to worry about the type of the argument,
however. If that mattered, you'd have to check the argument types as they came
in and perform different kinds of comparisons based on the results of that
check. If you've written much Perl, you've written this kind of code. That's
one drawback of a single dispatch system.

In a multi dispatch system, you can define multiple functions with the same
name but different signatures. (If that sounds like signature-based function
overloading in Java or C++, it should -- but static manifest typing means that
the compiler does its best to resolve those dispatches at the point of the call
at compile time. If it can't do that, C++ will give you horrible error
messages and Java will vomit stack traces. At least, I recall that Java has a
weak stomach full of stack traces and loves to read while someone else drives
it fast down narrow mountain roads full of hairpin turns.) At
runtime, the types of the parameters passed to the call determine
which variant function to call.

That is, you can define divide:multi(Integer, _) and
divide:multi(Float, _), and Parrot's MMD system will call the
first one when you write divide( 10, 2 ) and the second when you
write divide( 5.0, 2.0 ). (You may ask "What about dispatching
based on the expected return value?", but that's a much bigger
flophouse full of monkeys.)

I mean, you can say that in Parrot now, as of a few minutes ago.

I mentioned primitives before. In PIR (Parrot's native language), the
numbers I used as parameters before are actually integer and float constants.
Parrot's PIR compiler stores them in I and N registers, respectively. I
haven't yet explained that the type signatures on the PIR functions are tuples,
nor that uppercase names represent PMC types and the underscore is the
Prolog/Haskell placeholder, which means any type will do.

If you've read closely, you have realized that the arguments get passed in
completely different registers than Parrot expects. Fortunately, Parrot's
calling conventions account for this. We have PMC types which represent the
other primitives: Integer, Float, and String. It's easy to convert from an
integer primitive into an Integer PMC and so on. (The opposite isn't as easy,
but that's again a different story.) This process is autoboxing, and
Parrot's calling conventions handle it already automatically -- at least they
do, in the single dispatch case.

The multiple dispatch case is more complex. It's not as easy as looking up
a symbol in a namespace, checking to see if it's invokable, stuffing the
parameters into the proper registers, and calling it. In between the "Alright,
time for parameter passing!" and "Go ahead and call it!" steps, something has
to say "Wow, there are a lot of variants! Which one is best, if any?"

There are several algorithms for finding the best match between multi
variants and the passed parameters. Parrot uses a Manhattan distance,
which calculates the number of steps between the actual parameter and the
expected parameter for each possible parameter and each possible variant. How
far is it to the store from my house? It's one block north, one block east,
one block north, eight blocks east, and one block north. (As usual, this is a
tiny lie, but pedagogical lies are acceptable in computer programming.) The
multi variant tuple with the smallest distance for the given parameter tuple is
the best fit.

Autoboxing isn't a problem for Parrot's calling conventions, but Parrot's
Manhattan distance algorithm knew nothing about it until Parrot r32211.
As far as it cared, the distance between an I register and a PMC register
containing an Integer was way too far ever to think about walking. My change
refines the algorithm to say "If the parameter is a primitive type -- integer,
float, or string -- and if the expected argument is the corresponding PMC type,
add one point of distance to account for the conversion, then continue looking
at the rest of the tuple."

For as simple as that code looks, it almost had a bug where it preferred to
autobox primitives to meet a variant such as foo:multi( Integer )
even when a variant foo:multi( int ) existed. This depended on
the order of evaluation of compilation. I had forgotten to increase the
distance of this parameter to account for autoboxing. Any variant which
accepts the primitive as it is should take priority over one which requires
autoboxing.

As with many of these patches I've explained, writing the explanation took longer than writing the patch -- even with that bugfix.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Without JavaScript enabled, you might want to
use the classic discussion system instead. If you login, you can remember this preference.