Ways of matrix multiplication

Forward propagation as well as backpropagation leads to some operations on matrixes. The most common one is a matrix multiplication. In order to perform matrix multiplication in reasonable time you will need to optimise your algorithms.

There is a simple way to do it on macOS by means of their Accelerate Framework . Actually this is an umbrella framework for vector-optimized operations:

cblas.h and vblas.h are the interfaces to Apple’s implementations of BLAS. You can find reference documentation in BLAS. Additional documentation on the BLAS standard, including reference implementations, can be found on the web starting from the BLAS FAQ page at these URLs: http://www.netlib.org/blas/faq.html and http://www.netlib.org/blas/blast-forum/blast-forum.html.

clapack.h is the interface to Apple’s implementation of LAPACK. Documentation of the LAPACK interfaces, including reference implementations, can be found on the web starting from the LAPACK FAQ page at this URL: http://netlib.org/lapack/faq.html

Each (modern) CPU work is based on parallel chunks processing. SIMD instruction can operate on all loaded data in a single operation. In other words, if the SIMD system works by loading up eight data points at once, the add operation being applied to the data will happen to all eight values at the same time. This parallelism is separate from the parallelism provided by a superscalar processor; the eight values are processed in parallel even on a non-superscalar processor, and a superscalar processor may be able to perform multiple SIMD operations in parallel.