In order to tune the performance of noticeably slow code of mine, I had a look at many different excellent posts on this topic here on M.SE, e.g. here to name the most interesting one.

Now I ask myself if I understood everything correctly or (perhaps) if there is even more tuning potential (for now and for future use cases). As the Latin proverb says "Verba docent, exempla trahunt" (= words teach but only examples really help) I am going to both describe what I learned and did according to that and to copy the optimized code here. Note, that sol is the solution of NDSolve with a vector of interpolation functions, W is a list of indices of the form i or {i,j,k} that are paired to be used inside ExpVal (for mathematical reasons {i, {k,l,m}} = {{k,l,m}, i} holds inside ExpVal so we can calculate only one half of the pairings).

I greatly acknowledge hints regarding further optimization potential or criticism regarding the code.

Optimization ideas

Vectorization: Try to use machine numbers as early as possible - I tried to do so by creating a vector of pure numbers inside n0[t] and work with it afterwards. Moreover, I have to pair each entry of h with each entry of h's conjugate so I used Abs for pairings of the same indices in h as well as hAdj.

Memoisation: Trade memory for performance - I tried to do so by including memoisation into the calculation of ExpVal.

Compilation: As the calculation of ExpVal happens extremely often, this part of the code is a major bottleneck. I compiled it to gain further performance.

Scoping constructs: With is faster as it doesn't create new local variables but only replaces given occurences so I used it instead of Module.

Detailed code description

On request, I will describe the code in more detail here. What the code does is to numerically simulate nonequilibrium physics via iterated equations of motion. Assume we have an operator $A(t) = \sum_i h_i(t) W_i$ whose time-dependence is completely mapped to the prefactors $h_i(t)$. Those prefactors are calculated via NDSolve whose solution is saved to sol. The $W_i$ are (quantum mechanical) basis operators consisting of one (i) or three elementary fermion operators ({k,l,m}). The typical BasisSize equals about 5000 different operators $W_i$.

The code has to calculate each possibility to pair two of the $W$ (say $W_i$ and $W_j$) and calculate their expectation value ExpVal. Those results are multiplied by the individual prefactors ($h_i$ and $h_j^*$) and summed up for the final result.

$\begingroup$It would help quite a bit if you gave us a tractable example of sol, W and some values for kF and BasisSize. But first question: can you explain why W[[i]] is sometimes one and sometimes three indices? This will prevent W from being a packed array. If this can be modified, ExpVal could probably be made completely vectorized, i.e. we could do ExpVal[W,W,kF]. Finally, can you explain in words what the code is doing?$\endgroup$
– Marius Ladegård MeyerDec 14 '16 at 19:23

$\begingroup$@MariusLadegårdMeyer Good remarks. You can assume kF to be $\pi/2$ but it doesn't matter, in principle. The rest of your questions is answered in the initial question under Detailed code description now. The fact that sometimes one and sometimes three indices participate is due to the commutator in the Heisenberg equations of motion and can't be change physically seen (but maybe in the code there is a possibility I don't see).$\endgroup$
– pbxDec 14 '16 at 21:29

1

$\begingroup$Agree:) Unfortunately I have been caught out too many times going down the rabbit hole of making my code more succinct or beautiful only to find its takes days to evaluate. Now If I have a large dataset where possible I try to prototype in MAA and move quickly to number crunching in something else.$\endgroup$
– RambleDec 15 '16 at 15:52

1

$\begingroup$The trouble is that even with vectoring you only need one or two code elements that work inefficiently within MMA and most your effort is hosed. You usually don't discover these until late in the process and by that time your optimized MMA code starts looking a lot like C (i.e. simple loops and iteration albeit vectorised). Things like Map and Parallel are black boxes and don't behave predictably when it comes to speed and compile doesn't work with most builtin functions).$\endgroup$
– RambleDec 15 '16 at 16:00

1

$\begingroup$The key is identifying how big the datasets your problem might be tackling early; as everything is always wonderful with the dummy test set. You also start hitting walls in trying to manipulate or copy results. Then the only answer is cutting everything up into smaller chunks..... You start off with something beautiful and end up slicing and contorting it into something like.... a conventional programming language only slower:)$\endgroup$
– RambleDec 15 '16 at 16:04

Mathematica is a registered trademark of Wolfram Research, Inc. While the mark is used herein with the limited permission of Wolfram Research, Stack Exchange and this site disclaim all affiliation therewith.