When the leading dimension of matrix A is not equal to the number of rows or columns, the MKL_?GEMM_COMPACT functions can return incorrect results when executed on a processor that does not support Intel ® AVX-2 or Intel ® AVX-512 instructions.

I remember seeing a post about taking advantage of MKL in case I want multiply many matrices by the same matrix.
It shows that for that case I can get performance of large matrices multiplications in case of small.