We derive an explicit formula for the inverse of a general, periodic, tridiagonal matrix. Our approach is
to derive its LU factorization using backward continued fractions (BCF) which are an essential tool in
number theory. We then use these formulae to construct an algorithm for inverting a general, periodic,
tridiagonal matrix which we implement in Maple1
. Finally, we present the results of testing the efficiency
of our new algorithm against another published implementation and against the library procedures
available within Maple to invert a general matrix and to compute its determinant.

We propose a set of new Fortran reference implementations, based on an algorithm proposed by Kahan,
for the Level 1 BLAS routines *NRM2 that compute the Euclidean norm of a real or complex input vector.
The principal advantage of these routines over the current offerings is that, rather than losing accuracy
as the length of the vector increases, they generate results that are accurate to almost machine precision
for vectors of length N < Nmax where Nmax depends upon the precision of the floating point arithmetic
being used. In addition we make use of intrinsic modules, introduced in the latest Fortran standards, to
detect occurrences of non-finite numbers in the input data and return suitable values as well as setting
IEEE floating point status flags as appropriate. A set of C interface routines is also provided to allow simple,
portable access to the new routines.
To improve execution speed, we advocate a hybrid algorithm; a simple loop is used first and, only if IEEE
floating point exception flags signal, do we fall back on Kahan's algorithm. Since most input vectors are
'easy', i.e., they do not require the sophistication of Kahan's algorithm, the simple loop improves performance
while the use of compensated summation ensures high accuracy.
We also report on a comprehensive suite of test problems that has been developed to test both our new
implementation and existing codes for both accuracy and the appropriate settings of the IEEE arithmetic
status flags.

The Collected Algorithms of the ACM (CALGO) is now the longest running journal-published series of algorithms. After placing CALGO in the context of other journal algorithm series, we discuss the factors that we believe have made CALGO the well respected means of publishing mathematical software that it is today. We report on how moving with the times and technology has ensured the survival of CALGO, and we look briefly at how we may continue this in the near future.

Programmers have long practiced the matter of mixed language procedure calls. This is particularly true for the programming languages C and Fortran. The use of the alternate language often results in efficient running time or the effective use of human or other resources. Prior to the Fortran 2003 standard there was silence about how the two languages interoperated. Before this release there existed a set of differing ad hoc methods for making the inter-language calls. These typically depended on the Fortran and C compilers. The newer Fortran standard provides an intrinsic module, isocbinding, that permits the languages to interoperate. There remain restrictions regarding interoperable data types. This paper illustrates several programs that contain core exercises likely to be encountered by programmers. The source code is available from the first author's web site. Included is an illustration of a ''trap'' based on use of the ad hoc methods: A call from a C to a Fortran 2003 routine that passes a character in C to a character variable in Fortran results in a run-time error.

We discuss the way in which the LAPACK version 3.0 suite of software is tested and look at how the application of software testing metrics affects our view of that testing. We analyse the ways in which some errors may be masked by the existing approach and consider how we might use the existing suite to generate an alternative test suite that is easily extensible as well as providing a high degree of confidence that the package has been well tested.

Hanson, R. and Hopkins, T. (2004). Algorithm 830: Another Visit With Standard and Modified Givens Transformations and A Remark on Algorithm 539. ACM Transactions on Mathematical Software[Online]30:86-94. Available at: http://doi.acm.org/10.1145/974781.974786.

First we report on a correction and improvement to the Level 1 Blas routine srotmg for computing the Modified Givens Transformation (MG). We then, in the light of the performance of the code on modern compiler/hardware combinations, reconsider the strategy of supplying separate routines to compute and apply the transformation. Finally, we show that the apparent savings in multiplies obtained by using MG rather than the Standard Givens Transformation (SG) do not always translate into reductions in execution time.

Barnes, D. and Hopkins, T. (2003). The impact of programming paradigms on the efficiency of an individual-based simulation model. Simulation Modelling Practice and Theory[Online]11:557-569. Available at: http://dx.doi.org/10.1016/j.simpat.2003.08.002.

We look in detail at an individual-based simulation of the spread of barley yellow dwarf virus. The need for a very large number of individual plants and aphids along with multiple runs using different model parameters mean that it is important to keep memory and processor requirements within reasonable bounds. We present implementations of the model in both imperative and object-oriented programming languages, particularly noting aspects relating to ease of implementation and run-time performance. Finally, we attempt to quantify the cost of some of the decisions made in terms of their memory and processor time requirements. [Note: DOI: http://dx.doi.org/10.1016/j.simpat.2003.08.002]

Hopkins, T. (2002). A Comment on the Presentation and Testing of CALGO Codes and a Remark on Algorithm 639: To Integrate Some Infinite Oscillating Tails. ACM Transactions on Mathematical Software[Online]28:285-300. Available at: http://dx.doi.org/10.1145/569147.569148.

We report on a number of coding problems that occur frequently in published CALGO software and are still appearing in new algorithm submissions. Using Algorithm 639 as an extended example, we describe how these types of faults may be almost entirely eliminated using available commercial compilers and software tools. We consider the levels of testing required to instil confidence that code performs reliably. Finally, we look at how the source code may be re-engineered, and thus made more maintainable, by taking account of advances in hardware and language development.
Note: This is being made available as UKC Technical Report No: 4-02 (March 2002), DOI: http://doi.acm.org/10.1145/569147.569148

Since 1960 the Association for Computing Machinery has published a series of refereed algorithm implementations known as the Collected Algorithms of the ACM (CALGO). Most of those published since 1975 are mathematical algorithms, and many of them remain useful today. In this paper we describe measures that have been taken to bring some 400 of these latter codes to an up-to-date and consistent state.

By analyzing the log files generated by the UK National Web Cache and by a number of origin FTP sites we provide evidence that an FTP proxy cache with knowledge of local (national) mirror sites could significantly reduce the amount of data that needs to be transferred across already overused networks. We then describe the design and implementation of CFTP, a caching FTP server, and report on its usage over the first 10 months of its deployment. Finally we discuss a number of ways in which the software could be further enhanced to improve both its efficiency and its usability.

A Fortran 90 Code for Unconstrained Nonlinear Minimization (ACM Trans. Math. Softw. 20, 3 (Sept. 1994), pages 354-372; CALGO Supplement 131) was ported to a number of compiler-platform combinations. The necessary changes to the code are given along with some comparative timings.

We look at how both logical restructuring and improvements available from successive versions of Fortran allow us to reduce the complexity (measured by a number of the commonly used software metrics) of the Level 1 BLAS code used to compute the modified Givens transformation. With these reductions in complexity we claim that we have improved both the maintainability and clarity of the code; in addition, we report a fix to a minor problem with the original code. The performance of two commercial Fortran restructuring tools is also reported.

The authors present data-how solutions on a pipeline of transputers for banded and dense systems of linear equations using Gauss elimination and the Gauss-Jordan method, respectively, These implementations, written in occam, are especially effective when there is a continuous supply of right-hand sides to be solved with the same coefficient matrix. Attention is paid to both load balancing and resource handling within the processor elements of the pipeline, When solving multiple right-hand sides, floating-point efficiency levels on 32-processor implementations range from 110% (for dense systems) down to 90% (for banded systems), where 100% represents the peak performance attainable from a single transputer applied to the same problem (effectively back-to-back floating-point operations on data in external memory). Some conclusions are drawn on efficiency issues arising from state-of-the-art massively parallel supercomputers.

We use knot count and path count metrics to identify which routines in the Level 1 basic linear algebra subroutines (BLAS) might benefit from code restructuring. We then consider how logical restructuring and the improvements in the facilities available from successive versions of Fortran have allowed us to improve the complexity of the code as measured by knot count, path count and cyclomatic complexity, and the user interface of one of the identified routines which computes the Euclidean norm of a vector. With these reductions in complexity we hope that we have contributed to improvements in the maintainability and clarity of the code. Software complexity metrics and the control graph are used to quantify and provide a visual guide to the quality of the software, and the performance of two Fortran code restructuring tools is reported. Finally, we give some indication of the cost of the extra numerical robustness offered by the BLAS routine over the use of new Fortran 90 intrinsic functions.

We present a collection of public-domain Fortran 77 routines for the solution of systems of linear equations using a variety of iterative methods. The routines implement methods which have been modified for their efficient use on parallel architectures with either shared or distributed memory. PIM was designed to be portable across different machines. Results are presented for a variety of parallel computers.

We investigate the evolution of a medium sized software package, sc LAPACK, through its public releases over the last six years and establish a correlation, at a subprogram level, between a simply computable software metric value and the number of coding errors detected in the released routines. We also quantify the code changes made between issues of the package and attempt to categorize the reasons for these changes. We then consider the testing strategy used with sc LAPACK. Currently this consists of a large number of mainly self-checking driver programs along with sets of configuration files. These suites of test codes run a very large number of test cases and consume significant amounts of cpu time. We attempt to quantify how successful this testing strategy is from the viewpoint of the coverage of the executable statements within the routines being tested.

The Collected Algorithms of the ACM (CALGO) is now the longest running journal-published series of algorithms. After placing CALGO in the context of other journal algorithm series, we discuss the factors that we believe have made CALGO the well respected means of publishing mathematical software that it is today. We report on how moving with the times and technology has ensured the survival of CALGO, and we look briefly at how we may continue this in the near future.

Software engineering is not an empirically based discipline. As a result, many of its practices are based on little more than a generally agreed feeling that something may be true. Part of the problem is that it is both relatively young and unusually rich in new and often competing methodologies. As a result, there is little time to infer important empirical patterns of behaviour before the technology moves on. Very occasionally an opportunity arises to study the defect growth and patterns in a well-specified software system which is also well-documented and heavily-used over a long period. Here we analyse the defect growth and structural patterns in just such a system, a numerical library written in Fortran evolving over a period of 30 years. This is important to the wider community for two reasons. First, the results cast significant doubt on widely-held long standing beliefs and second, some of these beliefs are perpetuated in more modern technologies. Since we obviously generalise from older languages to new, it makes good sense to use empirical long-term data when it becomes available to re-calibrate those generalisations. At the same time, the results contain intriguing glimpses into defect behaviour which may transcend whatever technology is in use.

Barnes, D. and Hopkins, T. (2001). The Impact of Programming Paradigms on the Efficiency of an Individual-based Simulation Model. university of kent.

Individual-based models are a popular technique for simulating a wide range of ecological systems. However, to be successful, they must not only deliver an accurate representation of the system they are seeking to model, but must do so using viable amounts of computing resource. Models involving very large numbers of individuals will tend to have large memory requirements, while the need to vary parameter settings over multiple runs means that processor requirements must be kept within reasonable bounds. In order to address the issue of resource requirements, we assess the impact of using different programming paradigms for the implementation of an individual-based models. We do this by looking in detail at a number of implementations of a simulation of the spread of Barley Yellow Dwarf Virus. The model considers explicitly each individual plant and aphid, therefore it requires special care to reduce the amount of storage used whilst still producing a computationally efficient code. We present implementations of the model in both imperative and object-oriented programming languages, particularly noting aspects relating to ease of implementation and run-time performance. Finally, we attempt to quantify the cost of some of the decisions made in terms of their memory and processor time requirements.

We look at how both logical restructuring and improvements available from successive versions of Fortran allow us to reduce the complexity (measured by a number of the commonly used software metrics) of the Level 1 BLAS code used to compute the modified Givens transformation. With these reductions in complexity we claim that we have improved both the maintainability and clarity of the code; in addition, we report a fix to a minor problem with the original code. The performance of two commercial Fortran restructuring tools is also reported.

Hopkins, T. (1997). Is the Quality of Numerical Subroutine Code Improving?. University of Kent.

We begin by using a software metric tool to generate a number of software complexity measures and we investigate how these values may be used to determine subroutines which are likely to be of substandard quality. Following this we look at how these metric values have changed over the years. First we consider a number of freely available Fortran libraries (Eispack, Linpack and Lapack) which have been constructed by teams. In order to ensure a fair comparison we use a restructuring tool to transform original Fortran 66 code into Fortran 77. We then consider the Fortran codes from the Collected Algorithms from the ACM (CALGO) to see whether we can detect the same trends in software written by the general numerical community. Our measurements show that although the standard of code in the freely available libraries does appear to have improved over time these libraries still contain routines which are effectively unmaintainable and untestable. Applied to the CALGO codes the metrics indicate a very conservative approach to software engineering and there is no evidence of improvement, during the last twenty years, in the qualities under discussion.

Hopkins, T. (1997). New Implementations of the Spectral Test. University of Kent.

We present three versions of the revised spectral test for the analysis of liner congruential random number generators. One is a Fortran 90 version of the code presented in ~citeHopkins83 which extends the range of integer arithmetic operations by performing the arithmetic using floating-point numbers. The range of modulus values which may be analyzed is determined by the length of the mantissa. The other two implementations use the multiple precision arithmetic facilities provided by the Fortran 90 package, mpfun~citeBailey and the Unix program emphbc (a version of this program is freely available from GNU). Both these allow arbitrary values of the modulus to be analyzed notwithstanding the underlying integer and floating-point hardware.

Hopkins, T. and Morse, D. (1996). The Implementation and Visualisation of a Large Spatial Individual-Based Model using Fortran 90. UKC.

We look in detail at the implementation of a simulation of the spread of Barley Yellow Dwarf Virus in a barley field. The model considers explicitly each individual plant and aphid, therefore it requires special care to reduce the amount of storage used whilst still producing a computationally efficient code. We attempt to quantify the cost of some of the decisions made in terms of their memory and processor time requirements. Finally we briefly consider the visualisation of the results and how the amount of data produced by the model may be reduced to a manageable level.

<< This reports is an updated version of 2-94 >> We describe PIM (Parallel Iterative Methods), a collection of Fortran 77 routines to solve systems of linear equations on parallel computers using iterative methods. A number of iterative methods for symmetric and nonsymmetric systems are available, including * Conjugate-Gradients (CG), * Bi-Conjugate-Gradients (Bi-CG), * Conjugate-Gradients squared (CGS), * the stabilised version of Bi-Conjugate-Gradients (Bi-CGSTAB), * the restarted stabilised version of Bi-Conjugate-Gradients (RBi-CGSTAB), * generalised minimal residual (GMRES), * generalised conjugate residual (GCR), * normal equation solvers (CGNR and CGNE), * quasi-minimal residual (QMR) with coupled two-term recurrences, * transpose-free quasi-minimal residual (TFQMR) and * Chebyshev acceleration. The PIM routines can be used with user-supplied preconditioners, and left-, right- or symmetric-preconditioning are supported. Several stopping criteria can be chosen by the user. In this user's guide we present a brief overview of the iterative methods and algorithms available. The use of PIM is introduced via examples. We also present some results obtained with PIM concerning the selection of stopping criteria and parallel scalability. A reference manual can be found at the end of this report with specific details of the routines and parameters.

We provide a Fortran 77 version of the Applied Statistics Algorithm AS57 `Printing Multidimensional Tables'' originally appearing in the book `Applied Statistics Algorithms'' by P. Griffiths and I.D. Hill. We believe that the new code offers improvements both in readability and maintainability. [The file /pub/misc/statlib/apstat/as057.sh contains the Fortran 77 source code, example driver code, data and sample results as a Unix shar file. It is available via anonymous ftp from unix.hensa.ac.uk]

da Cunha, R. and Hopkins, T. (1994). A Comparison of Acceleration Techniques Applied to the SOR Method. University of Kent, Computing Laboratory.

In this paper we investigate the performance of four different SOR acceleration techniques on a variety of linear systems. These are the Dancis's accelerations, Wynn's epsilon algorithm and Graves-Morris's generalisation of Aitken's delta-squared algorithm. The experimental results show that these accelerations can reduce the amount of work required to obtain a solution and that their rates of convergence are generally less sensitive to the value of the relaxation parameter than the straightforward SOR method. Necessary conditions for the reduction in the computational work required for convergence are given for each of the accelerations, based on the number of floating-point operations. It is shown experimentally that the reduction in the number of iterations is related to the separation between the two largest eigenvalues of the SOR iteration matrix for a given omega. This separation influences the convergence of all the acceleration techniques above. Another important characteristic exhibited by these accelerations is that even if the number of iterations is not reduced significantly compared to the SOR method, they are competitive in terms of number of floating-point operations used and thus they reduce the overall computational workload.

Hopkins, T. and Slater, J. (1994). A Comment on the Eispack Machine Epsilon Routine. University of Kent, Computing Laboratory.

We analyze the algorithm used to generate the value for the machine epsilon in the Eispack suite of routines and show that it can fail on a binary floating-point system. The comments in the code describing the conditions under which this method will work are not restrictive enough and we provide a replacement set of assumptions. We conclude by suggesting how the algorithm may be modified to overcome most of the shortcomings.

Hopkins, T. and da Cunha, R. (1994). The Parallel Iterative Methods (PIM) package for the solution of systems of linear equations on parallel computers. University of Kent, Computing Laboratory.

We present a collection of public-domain Fortran 77 routines for the solution of systems of linear equations using a variety of iterative methods. The routines implement methods which have been modified for their efficient use on parallel architectures with either shared- or distributed-memory. PIM was designed to be portable across different machines. Results are presented for a variety of parallel computers.

We provide a Fortran 77 version of the Applied Statistics Algorithm AS30 `Half-Normal Plotting'' originally appearing in the book `Applied Statistics Algorithms'' by P. Griffiths and I.D. Hill. We believe that the new code offers improvements both in readability and maintainability. [The file /pub/misc/statlib/apstat/as030.sh contains the Fortran 77 source code, example driver codes, data and sample results as a Unix shar file. It is available via anonymous ftp from unix.hensa.ac.uk]

We describe the parallelisation of the GMRES(c) method and its implementation on distributed-memory architectures, using both networks of transputers and networks of workstations under the PVM message-passing system. The test systems of linear equations considered are those derived from five-point finite-difference discretisations of partial differential equations.

We show how highly efficient parallel implementations of basic linear algebra routines may be used as building blocks to implement efficient higher level algorithms. We discuss the solution of systems of linear equations using a preconditioned Conjugate-Gradients iterative method on a network of transputers. Results are presented for the solution of both dense and sparse systems; the latter being derived from the finite-difference approximation of partial differential equations.

We report on our experiences in porting a number of linear algebra subroutines, written in occam2, from a transputer environment to a cluster of workstations using Fortran 77 and the PVM message-passing system.

We report our experiences using the parallel programming environments, PVM, HeNCE, p4 and TCGMSG and discuss some aspects concerning the performance and software engineering issues. A brief overview of each environment is given and a number of case studies written using a number of different programming paradigms are presented. Some of the examples presented are simple ``building-blocks'' which may enhance the performance of parallel applications, others are complete applications.

We present an index of all the algorithms which have been published in Applied Statistics between 1968 and 1991 inclusive. The algorithms have been classified using a modified version of the GAMS (Guide to Available Mathematical Software) Problem Classification Scheme, given by Boisvert et al., which has been considerably expanded especially in the statistical area. GAMS is a variable depth classification scheme. The first character, which is always a capital letter, gives the major subject area, further subdivisions are recursively denoted by alternating numbers and lower case letters. Thus, for example, D3a4 is in the main classification area of Linear Algebra (D), subarea Determinants (3), sub-subarea Real Nonsymmetric Matrices (a), sub-sub-subarea Sparse (4). The full classification list is reproduced. Information on how to obtain sources of the algorithms is also given.

This paper describes the performance of a multigrid method implemented on a transputer-based architecture. We show that the combination of fast floating-point hardware, local memory and fast communication links between processors provide an excellent environment for the parallel implementation of multigrid algorithms. The gain in efficiency obtained by increasing the number of processors is shown to be nearly linear and comparisons are made with published figures for a parallel multigrid Poisson solver on an Intel iPSC 32-node hypercube.

We present an implementation of a finite-difference approximation for the solution of partial differential equations on transputer networks. The grid structure associated with the finite-difference approximation is exploited by using geometric partitioning of the data among the processors. This provides a very low degree of communication between the processors. The resultant system of linear equations is then solved by a variety of Conjugate Gradient methods. Care has been taken to ensure that the basic linear algebra operations are implemented as efficiently as possible for the particular geometric partitioning used.

We present a study of the implementational aspects of iterative methods to solve systems of linear equations on a transputer network. Both dense and sparse systems are considered. First we discuss the implementation of a set of distributed linear algebra subroutines which are used as building blocks for implementing the iterative methods. We show that the use of loop-unrolling significantly increases the efficiency of these implementations. The effect of the sparsity of the matrices on the performance is analysed. Finally, serial and parallel implementations of a polynomial preconditioned Conjugate Gradient method are presented.

This report contains a cumulative index to the Collected Algorithms of the ACM. The algorithms are classified using the modified SHARE classification, several different views of which are provided. The source codes of these routines originally appeared in the Communications of the ACM and, from Algorithm 493, in the ACM Transactions on Mathematical Software. All algorithms up to and including those appearing in the December 1991 issue of TOMS are included in the index. Information on how to obtain sources of the algorithms is also given.

We look at how the application of software testing metrics affects the way in which we view the testing of the Lapack suite of software. We discuss how we may generate a test suite that is easily extensible and provides a high degree of confidence that the package has been well tested.

We report on our experiences of applying a number of software testing techniques and software quality metrics to a medium sized numerical package. This package includes its own testing routines and we report a number of areas where we believe both the testing process and the code may be improved. We also report a number of faults and discuss a testing regimen which we have developed that appears to be more effective, efficient and extensible than the one currently provided with the package.

We investigate the evolution of a medium sized software package, LAPACK, through its public releases over the last six years and establish a correlation, at a subprogram level, between a simply computable software metric value and the number of coding errors detected in the released routines. We also quantify the code changes made between issues of the package and attempt to categorize the reasons for these changes. We then consider the testing strategy used with LAPACK. Currently this consists of a large number of mainly self-checking driver programs along with sets of configuration files. These suites of test codes run a very large number of test cases and consume significant amounts of cpu time. We attempt to quantify how successful this testing strategy is from the viewpoint of the coverage of the executable statements within the routines being tested.

We look in detail at the implementation of a simulation of the spread of Barley Yellow Dwarf Virus in a barley field. The model considers explicitly each individual plant and aphid, therefore it requires special care to reduce the amount of storage used whilst still producing a computationally efficient code. We attempt to quantify the cost of some of the decisions made in terms of their memory and processor time requirements. Finally we briefly consider the visualisation of the results and how the amount of data produced by the model may be reduced to a manageable level.

We look in detail at the implementation of a simulation of the spread of Barley Yellow Dwarf Virus in a barley eld. The model considers explicitly each individual plant and aphid, therefore it requires special care to reduce the amount of storage used whilst still producing a computationally ecient code. We attempt to quantify the cost of some of the decisions made in terms of their memory and processor time requirements. Finally we briey consider the visualisation of the results and how the amount of data produced by the model may be reduced to a manageable level.

We begin by using a software metric tool to generate a number of software complexity measures and we investigate how these values may be used to determine subroutines which are likely to be of substandard quality. Following this we look at how these metric values have changed over the years. First we consider a number of freely available Fortran libraries (Eispack, Linpack and Lapack) which have been constructed by teams. In order to ensure a fair comparison we use a restructuring tool to transform original Fortran 66 code into Fortran 77. We then consider the Fortran codes from the Collected Algorithms from the ACM (CALGO) to see whether we can detect the same trends in software written by the general numerical community. Our measurements show that although the standard of code in the freely available libraries does appear to have improved over time these libraries still contain routines which are effectively unmaintainable and untestable. Applied to the CALGO codes the metrics indicate a very conservative approach to software engineering and there is no evidence of improvement, during the last twenty years, in the qualities under discussion.