Saturday, January 25, 2014

Data structure for matrix in VBA

Large part of programming financial models has something to do with handling and operating matrices. However, for handling any matrix data, there is not ready-made data structure available for this purpose in VBA. The only way to get around this limitation, is to write your own custom data structure by using existing data structures, such as arrays. Within my previous posting on Gaussian Copula implementation, I created one such custom data structure, a Matrix class. Technically speaking, this class is just wrapping arrays into a
manageable object with setters and getters, plus provides some of the most common matrix operations for the user.

The idea was very noble and I was quite happy with the first development candidate. I especially liked the idea of data structure being an object, also for schematic and semantic reasons. Plus, I thought that I would finally get rid of all those boring extra lines of code needed, when using plain arrays. I also thought that only a small performance penalty would be paid, with all function calls made
using class accessors (push, at). However, as I was prototyping Monte Carlo basket equity option pricing and doing matrix operations with large matrices, I quickly realized that the cost of processing matrix operations was definitely way too high.Testing processing times

The issue was bothering me in a such way, that I finally wanted to get some hard-tested facts on handling and operating matrix structures. For this reason, I prepared test cases for three different matrix schemes. Within these test cases, matrix data structure was implemented by using

Within test cases mentioned, a simple procedural program performs the following operations for all matrix schemes described above

Creates three matrices (A, B, C)

Fills two matrices (A, B) with random numbers

Performs matrix multiplication (A, B) and returns the result into matrix (C)

Time elapsed was recorded only for the actual matrix multiplication operation. For each matrix schemes, matrix B rows (A columns) were dimensioned from 100 000 to 2 000 000 and matrix B columns (A rows) were assumed to be constant 10. For example, in the first calculation, we multiplied matrix A (10 * 100 000) with matrix B (100 000 * 10) and received matrix C (10 * 10). In the second calculation matrix dimensions were A (10 * 200 000) and B (200 000 * 10) and we received matrix C (10 * 10), and so on.

Cold shower

The following chart is presenting the findings of this experiment.

We can clearly see, that a simple two-dimensional array is the most efficient data structure for handling large matrix operations in VBA. There is just no way out of this, period. Testing program is presented below. You can just copy-paste it into a new standard VBA module, if you are interested to run it in your own laptop. Remember to create reference to Microsoft Scripting Runtime library.

Additionally, I also tested using MMULT function with simple
two-dimensional arrays. Efficiency of this method is only marginally
better, than using two-dimensional arrays with the code provided above
(testRun_2DArray). Moreover, there is a limit of the sizes of matrices
what we can feed for this worksheet function and those are surprisingly
low. For example, trying to multiply A (10 * 100 000) with B (100 000 * 10) leads to runtime error.

The chart below is presenting the results for test cases with small matrices, including test case for using MMULT worksheet function. For each matrix schemes, matrix B rows (A columns) were dimensioned from 1 000 to 65 000 and matrix B columns (A rows) were assumed to be
constant 10. For example, in the first calculation, we multiplied matrix
A (10 * 1 000) with matrix B (1 000 * 10) and received matrix C (10 * 10). In the second calculation matrix dimensions were A (10 * 2 000) and B (2 000 * 10) and we received matrix C (10 * 10), and so on.

The direction of the results is the same as with large matrices. Using MMULT worksheet function is the most efficient choice, but only marginally better than using simple two-dimensional arrays. The use of Matrix wrapper class for small matrix operations can still be seen as reasonable choice, since the time loss compared to more efficient choices is after all, relatively small.

Final run

Just for the curious, I wanted to compare VBA matrix operations efficiency results with the corresponding C++ results. For this reason, I used dynamically allocated arrays. Otherwise, the actual testing program was basically the same as for VBA cases: allocate memory for arrays, fill arrays with random numbers, perform matrix multiplication and finally release the allocated memory. Time elapsed was recorded only for the actual matrix multiplication operation. The chart below is presenting the results.

In a nutshell, average efficiency ratio (VBA processing time / C++ processing time) is 5.24 for this experiment sample. Moreover, larger arrays can be handled in C++ than in VBA, since the memory is allocated from the heap memory instead of stack memory.

Afterthoughts

So, for any large and time-critical matrix operations performed in VBA, a simple
two-dimensional array is the most efficient data structure which can be provided by
VBA. For a small matrix operations, arrays wrapped in class can still be used. For real hardcore calculations (very large matrices, fast processing times), VBA is unfortunately not efficient tool for handling such calculations.

The programs, which are presented in this blog, can be freely used, but without warranty or support of any kind. By using the programs presented in this blog, you accept to bear the entire risk, concerning quality or performance of any programs used. In no event, will I be liable to you for the damages, including any general, special, incidental or consequential damages arising out of the use or inability to use the programs presented in this blog. By using the programs presented in this blog, you are accepting the content of this disclaimer.