What took up all of that memory? The matrix , the vector , and the set of observations .

It turns out that we don’t really need to keep any of these around. In this post I’ll describe how to implement the algorithm without materializing and . This will reduce the memory footprint enough to make a practical algorithm that can handle ~100 observations. This is the algorithm used in the accelerometer calibration sketch I posted last year.

In the next post, I’ll show how to accumulate 25 statistics about the observations that have all of the information we need to run Gauss-Newton in constant space and time.

Let’s start by getting rid of the big arrays of statistics.

1. Only materialize what you need

When carrying out the Gauss-Newton algorithm as it was sketched in the last post, we never need direct access to the matrix entries of or . We just need to work with and — two arrays with sizes that do not depend on the number of samples collected.

We will materialize these matrices instead of the big matrices.

This is possible because each matrix entry in and can be written as a sum of functions of individual observations. This means we can look at an observation, compute the function of the observation, add it to the matrix entry, then move on to the next observation.

Let’s make this more concrete and look at the 00 entry of . Recall Equation 4 of part 1:

from which the definition of matrix multiplication gives

So to compute we can just accumulate this sum observation by observation. There is no need to compute the big matrices and .

The general matrix entries are as follows. For

for

and for

has a similar expression. Recall that

So for

and for

So all of the matrix entries we need to compute can be accumulated one observation at a time. We can store the observations, then at each iteration step in the Gauss-Newton algorithm run through these observations and use the formulas above to accumulate the matrix entries. Once we have done that, it is simple linear algebra to solve the matrix equation, find , and adjust . (Review the algorithm sketch here if needed.) This is essentially what is done in the sketch I posted last October, and I’ll use a slightly modified version of that code in the rest of this post.

2. Putting it into code

Now let’s translate this into C++. We won’t try to engineer the code here, just outline a simple Arduino sketch that does the job.

To start, we will need to declare some arrays to hold , and the observations . I’ll add an array for too even though it isn’t strictly needed.

This amounts to bytes, where is the number of samples. If we only stored the upper triangular part of , we could eliminate 15 floats, reducing the cost to bytes at the expense of making the code below a bit tougher to read. I will leave the modification as an exercise for the interested reader (but I’ll perform a similar one in the next post). The cost of the naïve algorithm was (had we declared arrays for and ), so this method uses significantly less memory for any number of samples. Importantly, the dependency on is dramatically reduced and this algorithm can now be used on and Arduino on data sets with ~100 samples.

With those declared, let’s look at the calibrate() routine. It starts by setting up some general parameters — how small a is small enough to stop? (eps), after how many iterations should we give up? (num_iterations), how big was the last ? (change) — then enters the main loop. There is nothing subtle about this loop, just

But up to this is really just a template for an algorithm. As we’ve just been discussing, the real work of the algorithm all falls in to the routine compute_matrices(). Everything else is simple. In this implementation, we compute the matrices by scanning the samples and updating the matrix entries as required by Equations 1–5. Here is an implementation.

By being careful about computing only the matrices we need, we were able to reduce the memory cost of the Gauss-Newton algorithm significantly and produce an algorithm that is viable on an inexpensive microcontroller. But we still need to keep the samples around and make a full pass through the sample set with each iterative pass — as changes so do the matrices. This costs space and time and limits the viability of the algorithm.

In the next post we will see that even here we are storing too much. Thanks to the nature of our residual function, we will see that there are 25 statistics that we can compute in one pass through the observation set that contain all of the information we need to compute and for any given . Once we do this, we will have an algorithm with space costs that are independent of the number of observations. (Well, really logarithmic because with too many samples a float will not have enough precision to capture the information in the marginal sample, but I’m not worrying about that for now.)

2 Responses to Implementing the Gauss-Newton Algorithm for Sphere Fitting (2 of 3)

Hey Ralph! Thanks for all the help, but is there a chance you could explain what each of your variables represent in lay mans terms, like beta etc. I was following you with mx, my, and mz and deltax deltay and deltaz, but now Im afraid Im lost. Thank you if you can help please!!!!