jpbMatrices

Note

Please do not use this for production code. Matrix-matrix multiplication is a part
of every BLAS implementation. Please look into ATLAS, Math.Net, Accelerate.h, cuBLAS,
LAPACK, Intel's MKL, or another open source or commercial BLAS library.

Summary

This system explores how different data structures and code optimizations effect matrix-matrix
multiplication. Simple and complex multiplication methods are included. Comparing
the various methods also tests compiler optimizations.

Performance Notes

MatrixFM's MultiplyBlockTransposeIndexerAccumulator method reaches at least 99% of maximum theoretical,
non-SIMD GFLOPS in both single-
and multi-threaded implementations. Performance was measured for double-precision
(64-bit) floating point elements with Turbo Boost and hyper-threading off.

Matrix Classes

The following classes are available:

Matrix1D: Uses a 1-dimensional array of doubles to store elements.

Matrix2D: Uses a 2-dimensional array of doubles to store elements.

MatrixAA: Uses an array of arrays of doubles to store elements.

MatrixMN: This is a wrapper around Math.Net's matrix.

MatrixFM: Uses an array of arrays of one-dimensional arrays of doubles to store elements.
This class was built specifically for cache optimization. Additionally, the
multiplication code has been heavily optimized. Please see jpbMatrices' README.TXT
for what the multiplication method names mean.

Multiplication Methods

Each of these methods is implemented with and without accumulators,
with and without indexers, and single- and multi-threaded for
Matrix1d, Matrix2D, and MatrixAA.

Basic: the standard 3-loop multiply.

Transpose: the second matrix is transposed into column-major order
and a modified basic multiply is performed.