I am re-writing some code of mine, that computes Feynman(-like) diagrams on a graph. (It is a linked cluster expansion, if you are curious, which is sort of like a Feynman diagram but where the background system has no coupling between points, and the points themselves are discrete -- the underlying system is a crystal, say. Questions on physics quite welcome!)

I learned ruby last year, and I am curious to see if I can actually do the things ruby is meant to help us do. I have working code that does what I describe below, but it is very slow, and I would love to learn from some ruby (or "FP") gurus how to take my relationship with execution time "to the next level."

In the end, it seems to be a question of how to "pre-compile" a lambda?

Here is the essential problem. You have a large graph, V, which you can represent as a (symmetric) matrix, V[i,j], telling you which points are connected to each other. It might be, for example, a 10^3 lattice. You also have a (much smaller) "diagram", call it D, which you can also represent as a (symmetric) matrix, D[a,b]. A diagram might be, for example, a three-vertex loop:

over all values of i, j and k. In other words, D gives the patterns of the indicies in the sum.

I thought a long time about how to do this in the general case. Here was my solution.

First, we need a good way to sum over an arbitrary number of indicies (the dimension of the diagram D); you want to be able to pass a block to the center of all the loops. I built the following (m_inner is the dimension of the graph V):

def all_sum(n_left,m_inner,block,running=[]) # sums from 0 to m_inner-1, n_left times (pass n, at the top level, # where n is the number of verticies in the diagram D) # (returns an *object* -- a lambda -- that can be executed by call) # at the center of the loop, passes the call structure through (i.e., # on the 3rd iteration of the first loop, 2nd iteration of the second, the block # is passed [2,1] )

Where @n is the dimension of the diagram D (e.g., in the three-loop above, it's 3), @g is the matrix of the diagram D, and v.net is the matrix associated with the graph V.

If all that makes sense, here is the issue. This is extremely slow. Is there an obvious way to speed it up? It seems like my inner block is being executed over and over again, but there should be a way to speed it up?

I poked around a little more, trying to see how to do the thing I thought I knew how to do in LISP -- just make the program write a program already! So you can pass a string to instance_eval, and it allows your code to write code.

This speeds things up until it is as fast as if you had written the code itself. A time test of the first thing I tried (in the post above), then of using instance-eval to write the inner_summand and passing that as a block to some clever lambdas, then of just having instance-eval write the whole thing, then of just cutting and pasting the particular function.

This is for a 5x5x5 cube (with sort of a torus topology), with a three-vertex loop.

And, the final code twerk -- the most significant of all. I ran the profiler, and noticed a lot of Kernel#kind_of? calls, as well as some calls to GSL. I converted all of the objects in the function to NArray objects (from GSL ones.) Still, many calls that seemed to reference the GSL libraries. Somehow the duck typing of ruby was worried I would pass GSL objects in.

I re-wrote the rest of the code to avoid GSL references, and now the entire computation runs in 3 seconds.

So, a final question would be: is it possible to "unload" a library for a while, or at least to make ruby pretend it does not exist? I would like to use GSL later, but it creates overhead even when the functionality of GSL is not needed.

The issue you are finding is due to initially writing to the wrong domain of the solution. Traditional coding is about maintainable code--stuff that is readable; however, you're stating specifically that the domain you want is speed. Those are two different techniques.

Ruby is an interpretive, immutable, garbage collected language. All calls, methods, arrays will have a performance hit.

The point of lambdas is flexibility of tying code dynamically at the cost of all the lookups necessary to find and execute.

In mutliple dimension arrays, the variable and subscripts have to be interpreted and resolved. Notably, these are dynamic constructs so internally they are linked lists of linked lists. They are not a flat memory allocation with a simple offset lookup.

The repeated use of temporary obects (especially strings) flood the garbage collector. Strings are immutable. If you really want to create one long big string from a bunch of short strings, throw the shorts ones into an array and join them at them end. Array.join is optimized to do this by creating one big string long enough for it all and copies into place.

So, what is found here is that coding for a specific domain can require specific knowledge of the development environment's design. If you want real speed, in this case you can really take advantage of Ruby's ability to build code. Your final algorythm is deterministic. You know exactly every input variable and the final matrix has a specify definition, so use meta code to def a method in a class that does exactly what you want. The code would generate something like this to be evalutated:

Class MyFastMatrix def Do_Matrix_D_3_by_3_V_10_by_10(vec_d, vec_v) # Loop through and grab all references into local variables so we never have to look them up again d_0_by_0 = vec_d[0,0] d_0_by_1 = vec_d[0,1] . . . v_9_by_9 = vec_v[9,9] # Output specific code for each result dynamically into result local variables r_0_by_0 = d_0_by_0 + d_0_by_1 ... blah, blah, blah . . . # Assemble result into array and return it return [ [r_0_by_0], ...end

All distant lookups are done once, nothing goes into garbage handling until its over, each calculation relies on the simplest lookup, there are no loops, and the routine, once created, can be used over since it's now actually in the class definition. This is trading coding space for speed.

We had a lot of fun working on this, and it was interesting to try to use ruby to accomplish a task that had, many years ago, been done in restricted cases by hand; an example is http://www.springerlink.com/content/n664krj72h632354/. It took perhaps two weeks to debug the code, and what was perhaps most amazing was to find, at the end, that the group in 1963 had, indeed, done the calculations without error.

In that case, yes. In other cases, the summations become more complicated -- e.g., a two-loop graph might be

V[i,j] V[j,k] V[k,p] V[p, i] V[k,i]

I seem to remember that graphs like these, with multiple loops, could not be transformed into "ordinary" matrix multiplication problems (but would be curious to hear if you or others had interesting suggestions -- it would indeed allow one to piggy-back on the standard, more parallelizable algorithms.)

More generally, it would be useful to use the transposed arrays where you're going withthe stride of the array rather than against it. No matter how you slice it, it's a N^k problemfor a k-product summation. But if you can utilize the ordering of the matrix to your advantagethen you'll get the benefit of spatial locality.

http://en.wikipedia.org/wiki/DSV_Alvin#Sinking wrote:Researchers found a cheese sandwich which exhibited no visible signs of decomposition, and was in fact eaten.