CPMGFit

The program cpmgfit is aLevenberg-Marquardt non-linear least squares fitting
program designed for fitting CPMG relaxation dispersion data to characterize chemical
exchange phenomena in NMR spectroscopy. It can be used in an interactive or batch
mode. Output from the program is intended to be graphed using the GRACE
program. Output is directed to standard output and can be redirected to a file
or piped directly into XMGR.

By default, curvefit assumes that the user will provide initial guesses
for the parameter values to be optimized. If the -grid option is specified,
then the the initial values will be chosen by a grid search. The user must
provide lower bound, upper bound and number of grid steps to use for each
parameter.

-jack

By default, curvefit provides estimates of the uncertainties in the fitted
parameters both from the covariance matrix and from a Monte Carlo simulation.
If -jack is specified, then a jackknife simulation is performed instead of
the Monte Carlo simulation.

-debug

Echo the input back to the terminal for debugging purposes.

-noerror

By default, curvefit assumes that data will be input as (x, y, dy) triples
(where dy is the uncertainty in y). If -noerror is specified, then only (x,y)
pairs will be read. -noerror implies -jack because the Monte Carlo simulations
cannot be done without uncertainties being provided.

The data will be read from a plain file without any header information.
The user will be prompted for other information.

-r seed

Specifies an initial seed for the random number generator. If USE_GETSEED
is defined in the Makefile for the program, then entering a seed is not necessary
because the program will use the system clock to generate the seed. The routine
getseed.f may need to be re-written for use on systems other than Silicon
Graphics in order to use the USE_GETSEED compiler option.

-m filename

Specifies a filename to be used for writing the values of the fitted paramters
determined for each Monte Carlo step in the Monte Carlo simulation of uncertainties.
This file is useful for performing error analysis during subsequent calculations
using fitted parameters.

-f filename

Specifies a filename to be read for input, rather than interactive data
entry from the terminal. The format of the input file is given below.

Input File Format

The input file format used with the -f flag is given below. In general, the
input consists of a keyword (title, function, xmgr, or data) followed by
input values (that may extend over more than one line as described below).
Strings that contain spaces must be enclosed in double quotes. The program is case sensitive.
Lines starting with "#" are treated as comments.
Input is free format and blank lines, tabs, and blanks are ignored.

This keyword is followed by the number of static magnetic fields used for
data collection and then by the values of the static magnetic fields in units
of tesla.

function string

String is one of the defined function names. The functions are listed using
the command curvefit -help. The function keyword is followd by M input
lines, one for each parameter in the function. In the default mode, each line
consists of a string defining the name of the parameter and a real number
defining the initial guess for the parameter. If -grid is specified,
the parameter string is followed by three entries: a real number defining
the lower bound for the parameter, a real number defining the upper bound
for the parameter and an integer defining the number of search steps between
the lower and upper bounds.
The string defining the name of the parameter can be followed by an optional string with the value 'fix' or 'opt'. If the keyword 'opt' is specified, then the following values on the line are treated exactly as if the string 'opt' were omitted. If the string value is 'fix', then the next two fields are read as fixed values, not to be optimized, of the parameter and the error in the fixed parameter (which can be set to zero). The error is used to perform a Monte Carlo simulation of the uncertainties in the fitted parameters due to variations in the fixed parameters.

xmgr

This keyword is followed by options used for XMGR. Each of these inputs
consists of the keyword @ followed by a string setting some XMGR option. See
the sample input files provided or the XMGR help pages for more information.
If the -xmgr option is not specified, these entries are ignored.

data

This keyword is followed by N lines of data, one line for each data point.
The data is input as (x, y, dy, field) quadruples, unless the -noerror
option is specified. In this case, the data is input as (x,y, field) triples.
x = 1/tcp, y = R2(1/tcp), in which tcp is the delay between 180 degree pulses
in the CPMG experiment and R2(1/tcp) is the relaxation rate constant; dy is
the experimental uncertainty in R2(1/tcp). If no uncertainties are specified
for the y values, then the data may need to be scaled to obtain good convergence
of the program.

In these expressions, pa and pb are the site populations, dw is the chemical
shift difference between sites (in units of 1/s), kex = 1/Tau is the exchange
rate constant, and Rex = pa pb dw^2 / kex. References for the above fitting
functions are given below. Instructions for adding additional functions are
given below.

Additional output is appended to the above if the -xmgr option
is specified. The output is mostly self-explanatory. The reduced chi-square
variable [X2(red)] is given by X2/(N-M), in which X2 is the chi-square value
for the best-fit model, N is the number of data points, and M is the number of
parameters in the function. If Monte Carlo simulations are performed, the
distribution of X2 is estimated and the percentiles of the distribution (from
5% to 95%) are reported.
The XMGR output produced using the above input file appears below.

Batch processing

A simple script 'batch_curve' is provided with the distribution that will
process all the data files in the current working directory
with a designated extension as part of their
file names. Output files will be written to disk and the fitted data will be displayed using XMGR. After viewing a given data set, exit XMGR to enable the script to proceed to the next data set.
For example, the command:

batch_curve DATA

will fit all files with names of the form string.DATA and produce output files string.DATA.out.

References

The linear least squares algorithm, non-linear Levenberg-Marquardt fitting algorithm
and Monte Carlo simulation procedure are adapted from the text Numerical Recipes
(Press, et al, 1992). This source should be consulted for additional information
concerning the algorithms. The jackknife simulation procedure is described in
Data Analysis and Regression. A Second Course in Statistics (Mosteller
and Tukey, 1977). References for the functional forms for curve-fitting relaxation
dispersion data are found in the review article "Nuclear magnetic resonance
methods for quantifying microsecond-to-millisecond motions in biological macromolecules"
(Palmer, Kroenke, and Loria; Methods Enzymol. 339:204-238 (2001)).

The Function Library

The library of defined functions is maintained in the file funclib.f.
This file contains three FORTRAN subprograms that must be modified to add
additional functions the the library.
The proper locations for modifying the funclib.f file are
indicated in the file itself.

initf(nfuncs,fnames,feqs,nparms,parnam)

This subroutine provides basic information concerning the defined functions.
To add a function, perform the following steps:

Increase the variable nfuncs by one. This is the total number of
defined functions.

Set fnames(i) = 'string', where string is the function name
without embedded spaces or tabs and i is the index of the function in the
library.

Set feqs(i) = 'string', where string is the functional form y=f(x) without
embedded spaces and i is the index of the function in the library.

Set nparms(i) = M, in which M is equal to the number of parameters in the function.

For k=1 to M, set parnam(k,i) = 'string' in which string is the name of the parameter, without embedded spaces, and i is the index of the function in the
library.

func(fname,x,a,M)

This subroutine returns the value of the function fname
at a point x. New functions
are added by adding an 'elseif' block in the subroutine:

elseif (fname.eq.'string') then
func='expression'

in which string is the function name and expression
is the FORTRAN representation
of the function. The parameters of the function are a(1) to a(M).

fgrad(fname,x,a,dyda,M)

This routine returns an array dyda containing the derivatives of the
function fname
with respect to the parameters a(1) to a(M) at the point x. New
functions are added by adding an 'elseif' block in the subroutine:

in which string is the function name and expression
is the FORTRAN representation
of the derivatives of the function with respect to a(1) to a(M).

Compilation

The program is written in FORTRAN 77 and is
compiled simply by typing make on the command line. The compiler type and flags are set as necessary in the Makefile. As discussed below, the BLAS library must be installed
on the workstation.

License

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the

This program uses some software routines copyrighted by Numerical Recipes Software.
You should obtain a license from Numerical Recipes if you do not have one already.
You can obtain an academic workstation license by sending your name, address,
email address, workstation hostname, workstation internet address, workstation
brand and model number, and a check for $50.00 to

Numerical Recipes Software
P.O. Box 243
Cambridge, MA 02238

Be certain to state that you want the FORTRAN version of the software. You
will also need the BLAS library installed on your workstation. This library
is normally supplied by the workstation vendor, or can be obtained from www.netlib.org.

History

Version 1.0 - Initial release August 3, 2001

Version 1.1 -Updated for Linux (4/26/02)

Version 1.2 - December 15, 2006

Version 1.21 - fixed error with Monte Carlo output - August 2, 2008

Version 1.30 - added 3-site CPMG functions, but not released publically