400x faster Matrix multiplication for Ruby

A friend called Allan de Medeiros Martins has made me loose some time playing with Restricted Boltzmann Machines just for fun!Matrix multiplication is a critical operation in respect to the performance of the algorithm we’ve been discussing. Ruby has a Matrix class at the standard library and its Matrix#* method does the job!
But, the whole thing was really slow compared to the matlab version of the code.
Then I implemented a simple version of the matrix multiplication using Array of Arrays and I was surprised that was something around 2.5x faster than the specialized Matrix#* method. Unfortunately, this was not even acceptable yet.
Doing some search I’ve (re)reached SciRuby project and their NMatrix library. AMAZING project! The NMatrix#dot method does the correct multiplication (dot product), while NMatrix#* is just an element-by-element multiplication. Using NMatrix#dot I could reach to 3x faster compared to Matrix#* and something around 1.2x faster than my Array version. But something was not right, I was expecting a much more significant improvement in speed. After digging around, gotcha! In some part of the code I was using NMatrix#map and this method was returning an “untyped” NMatrix object as advised at the documentation.

“Note that map will always return an :object matrix, because it has no way of knowing how to handle operations on the different dtypes.”.

Well, with an untyped matrix all the C specialized algorithms are disabled and we can’t get a good speed boost. So I have changed the code to guarantee that a :float64 NMatrix was used in all steps of the algorithm. Boom! The NMatrix#dot with a dtype: :float64 is more than 400x faster than Matrix#*.