Thursday, July 2, 2009

In this last installment of the Matrices series, we’ll look into making the Matrix class parameterized. What does that means? Basically, we want to have Matrix declared like this:

class Matrix[T]

So that we can create matrices of Ints, Doubles, or whatever other numeric type we need. This is not a trivial exercise, because of the way numeric types are defined in Java, and Scala’s backward compatibility with it. Basically, there is no common superclass to them.

Paul Chisuano offered two alternatives around that. One would involve the “Enrich My Library” pattern, but I do not think it would offer any gains over MatrixInt as it was built, and I see a problem or two implementing it.

We’ll discuss the other solution he proposes, which is, indeed, very clever. If we were building this in Java, would idea would be creating an interface with the operations we need, and a concrete class for each type we need. For instance, let’s call our interface Scalar, and our concrete classes implementing it IntScalar, DoubleScalar, etc. We would then define a Matrix class parameterized with an “upper bound” of Scalar, created from an Array of Arrays of Scalar. That would work, but can be quite slow.

Instead of that, we’ll use one of the more wizardry of Scala’s features, the implicit type conversion argument. You have probably seen things like this before:

implicit def xToY(x: X): Y = Y(x)

This definition makes it possible to convert an object of class X to an object of class Y without explicitly doing so. If I ever need an object of class Y and I have an object of class X instead, Scala will apply that definition to object X without any explicit statement on my part to do so.

Even if you haven’t seen that before, you almost certainly have used it. See the two examples below:

In each case, there is an implicit conversion at work. These definitions can, in fact, be found inside the object scala.Predef, which is imported automatically into scope of every Scala program. The first one is obvious, the second one, not so much. To understand the second, you have to realize that class Symbol does not define the method “->”, and that “->” is not a Scala keyword. I’ll leave as an exercise to find what implicit conversion is being used here.

What that means for us is that we can create a Scalar class, as well as implicit conversions from our desired types to it. Scalar would look like this:

One thing to note here is that none of the operations defined for Scalar take another Scalar as argument, nor do they return a Scalar. The reason for this is that we do not want to store Scalars. They are bulky and inefficient. We just want to quickly convert an Int or Double into Scalar, and then produce, through an Int or Double operation, and Int or Double result to be stored in an Array, or returned as result of determinant or similar methods.

Even without implicits, we could define Matrix like this:

class Matrix[T](self: Array[Array[T]], toScalar: T => Scalar[T])

We could then use toScalar everywhere we do an operation over a type T. With implicit, we could just add an implicit definition to the top of class Matrix, so we don’t need bother explicitly using toScalar. But we don’t need to do even that. Look at the following definition:

This definition of Matrix constructor, with two sets of () for arguments, is said to be curried. I won’t be explaining what that means here, because what interests us is the implicit keyword in there. The important thing here is that when you define something like that, then two things useful for us happen:

1. If there is an implicit definition of the type “T => Scalar[T]” in scope when we call Matrix, then it will be automatically passed to the constructor. Or, in other words, you can call “new Matrix(Array(Array(1)))” anywhere there is an implicit definition from Int to Scalar[Int], without having to pass that implicit definition as well.
2. There is no need to define an implicit conversion inside Matrix itself. Any implicit conversion passed as an argument will already be used inside the class.

This rule is also valid for implicit conversions arguments on method definitions, by the way.

Anyway, this is so useful and powerful that Scala has another way of saying the same thing:

class Matrix[T <% Scalar[T]](self: Array[Array[T]])

The “<%” keyword means there should be an implicit conversion between the first type and the second in scope where Matrix constructor is called, or that one such conversion must be explicitly passed.

So, we search&replace every instance of MatrixInt for Matrix[T], define our constructor as above (plus minor details such as “private” and “val”), replace Int with T where appropriate (remember that row and column coordinates are still Ints!), create the Scalar class, the subclasses for the types we want, the implicit conversions, and we are done, right?

Not so. Scalar, alas, is not enough. If you look at the definitions of sign, isDivisibleBy and invert, you will see we mix “T” with -1, 0 and 1. Since we do not know that -1, 0 and 1 can be mixed with “T”, we still need something else. Alas, what we need is actually very simple: an implicit definition between Int and T, so that when we use -1, it can be converted to the proper type, be it Int, Double or whatever.

On the first line there was at most one implicit conversion for each parameter. On the second line, though, since T doesn’t have the method “*”, and Int doesn’t have the method “*(T)”, two implicit conversions would need to be applied.

Scala won’t apply two implicit conversions on the same element. In this case, we can make “sign” return T, so the extra implicit conversion will be applied inside “sign”, thus avoiding this rule. We need to be careful, however, about such situations. As it happens, MatrixInt, as it was written, won’t present any such problems.

Anyway, we need, thus, two implicit conversions. The “<%” syntax isn’t enough for our particular case, so we have to do it the other way. Here’s our Matrix declaration:

Note that I write “implicit” only once in the declaration. All parameters in these last parentheses will be defined as implicit. Also, please note that implicit parameters must come last in such a declaration.

This does take care of almost everything, with a bit of judicious editing on various declarations to make the signatures match. There are some places, though, where that is not the case. Look at these lines inside the method “toString”:

We obviously can’t use “%d” for Doubles, for one thing. Also, maxNumSize doesn’t quite cut it, since we would like numbers to be aligned on the decimal point (or comma, depending on your locale), so we need to know at least two sizes. And then there are complex numbers, and who knows what else, so we don’t really know what information might be needed.

Therefore, I feel it is better to leave the choice inside Scalar itself. I propose to add the method below to Scalar, define it for Int as shown next, and then use as show last:

It would have been more appropriate to define that method inside a companion object, but this way is simpler. Note, as well, that “numFormat” became a visible property, so we can re-use all this logic inside LinearEquations.

Another place that needs fixing is inverse. We use the following rule:

Now, isDivisibleBy is implemented in terms of “%”, but that is just plain wrong for Double numbers. We could “fix” it by defining “%” to return 0, but that would make “%” act in a very unexpected way. So, instead, I’ll just define an “isDivisibleBy” operator inside Scalar:

def isDivisibleBy(other: T): Boolean

The last problem, as far as I can see, is with the equality methods. You’ll get erasure warnings in them if you use “Matrix[T]” on the case statement of equals or inside asInstanceOf on canEqual. Instead, use “Matrix[_]”, to indicate you don’t care what type T is. Since we are doing a deepEquals, T will sort itself out.

Note that we transform “xs” into a List inside DoubleScalar. That’s because we are iterating through it twice, which isn’t possible with an Iterator. Also note we added “absolute” and “numberSign”, which are used by LinearEquations.

The method “absolute” is not called “abs”, though, because “abs” is defined on RichInt, RichDouble, etc. So if we defined it for IntScalar, and then tried to use “abs” on an Int, the compiler would not know whether to use RichInt’s abs or IntScala’s abs.

We’ll also add the following two lines to the Matrix companion-object:

This way these two types, Int and Double, will automatically become available after importing “Matrix._”. The user can arbitrarily extend Matrix to cover any type, though, by creating a subclass of Scalar and defining an implicit conversion from the desired type to it.

Performance-wise, the boxing of the numeric types for each operation is detrimental, but it can probably be optimized by either the compiler or the JIT. In Scala 2.8 there will also be the option of marking a class “specialized”, which is creates specially optimized versions of a class for Java’s “primitive” types. I’m not sure it applies in this situation, though.

I’ll finish with the revised code for Matrix. If you have any questions or suggestions, please send them to me and I’ll try to address them. I hope you found this series useful!

3 comments:

You mention that there is a performance penalty associated with boxing. Have you done any benchmarks? It would be especially interesting to see if the optimizer and the @specialized keyword of Scala 2.8 can bring the performance to the level of non-boxing code.

Ittay, that was rather arbitrary of me to say. Without a benchmark, I really can't state that with certainty.

At any rate, there is a better solution, using type class-like constructs such as Scala 2.8's Numeric. That is better because it avoids creating new objects _entirely_. It still suffers from auto-boxing, though that could be made better with Scala 2.8's @specialized. I really should return to this subject once Scala 2.8 is out.

About Me

I'm an IT guy, a gamer and someone curious about my profession, politics, philosophy and anything that engages the mind. I like movies, books and cats. I'm a former FreeBSD committer, and a present (small) contributor to Scala.