Abstract

Recently R. H. Thompson [1996]
has performed simulation to study the performance of m+ks type
analysis on the Bruceton test. He has derived a set of k-factors
for the common all-fire levels and confidences for various sample
sizes. This report performs similar analysis on the Neyer D-,
c-, and S- Optimal and the Bruceton and Langlie
tests. This effort shows that the m+ks analysis calculates reliable confidence
values; as expected the k-factors are strongly dependent
on the all-fire level, confidence level, and sample size. In
addition, the k-factor is also strongly dependent on the
test method and ratio of the population standard deviation to the
estimate used to perform the test for both the Bruceton and
Langlie tests, but is constant for the Neyer designs. This report
also compares the variation of the m+ks all-fire levels obtained with each test
method. The data show that there is much less variation for
all-fire levels obtained with any of the Neyer designs than for
the similar Bruceton and Langlie methods. The implications of
this effort for testing of components will be discussed.

Introduction

Sensitivity tests are often used to estimate the parameters
associated with latent continuous variables that cannot be
measured. For example, in testing the sensitivity of explosives
to shock, each specimen is assumed to have a critical stress
level or threshold. Shocks larger than this level will always
explode the specimen, but smaller shocks will not lead to
explosion. Repeated testing of any one sample is not possible
because the stress that is not sufficient to cause explosion
nevertheless will generally damage the specimen. To measure
probability of response, samples are tested at various stress
levels and the response or lack of response is noted.

Explosive component designers are often interested in
determining the all-fire level, usually defined as the
level of shock necessary to cause 99.9% of the specimens to fire.
(Some designers seek the 99.99% of 99.999% levels.) A
distribution-independent method of estimating a level would
require at least one response and one non-response at the
specified level. Since sample sizes are usually far smaller than
the many thousands required to estimate these extreme levels, the
experimenter generally relies on parametric methods. Parametric
methods also allow the experimenter to characterize the
population as a whole, and to evaluate process variation.
Parametric designs test specimens at several stress levels. The
parameters of the population are most often estimated by maximum
likelihood techniques.

Three different test methods are most commonly used in the
explosive test community. The Bruceton Test (Dixon and Mood 1948) was
created before the invention of digital computers to simplify the
calculation of the parameters of the population. The Langlie Test
[1965] was invented to make a test
that was less dependent on initial guess of the parameters of the
population. The Neyer D-Optimal Test (Neyer
1989, 1994) was designed to
extract the maximum amount of information from each test item. A
comparison of the various test methods has been performed by
several authors (Edelman
and Prairie1966, Neyer
1994, Young and
Easterling 1994).

Confidence Level Calculations

Unlike the estimation of the parameters discussed previously,
there are a number of very different methods of estimating the
confidence intervals for the parameters. Each of these analysis
methods generally gives very different estimates of the
confidence intervals for the parameters of the distribution.

Simulation (Langlie 1965, Edelman and Prairie 1966, Neyer 1994) has shown that the variance
of the estimate of the mean, M, and the square root of the
standard deviation, S, is approximately proportional to
the population standard deviation, s2.
The variance function method assumes the variances of M
and S can be estimated by simple functions of the sample
size, N, and the standard deviation. Because s2 is not independently known,
all methods base their estimates of confidence intervals on the
estimate S. This function is generally dependent on both
the test design (the type of test, Langlie, Bruceton, etc.) and
the initial conditions. The functional dependence is most
generally determined by simulation. Langlie [1965] estimated that the variation of
the parameters for the Langlie test could be approximated by the
equations:

where za is the
100a percentile obtained from the
normal distribution. Several different software programs (Thompson 1987, Langlie 1988o, Neyer 1994la) use this method of
obtaining confidence levels.

The asymptotic or Cramer-Rao method (Kendall and Stuart 1967) is
used by programs such as ASENT (Mills
1980, Neyer 1994a). This method
computes the curvature at the peak of the likelihood function.
Variation of the parameters and confidence intervals are deduced
from this curvature estimate. The simple sum rule (Dixon and Mood 1948) for
analyzing Bruceton tests is based on the asymptotic method. If
the conditions for use of the sum rule are met, the sum rule
yields estimates of the parameters and confidence intervals for
these parameters that are almost identical to the asymptotic
values.

The Likelihood Ratio Test Method (Neyer
1992) has been shown to produce reliable confidence interval
estimates in all cases. This method calculates the ratio of the
likelihood function at computed at various points on the contour
to the likelihood function evaluated at the peak. Simulation (Neyer 1991, 1992)
has shown that the Likelihood Ratio Test produces more reliable
confidence intervals than the asymptotic method for small to
moderate sample sizes. This method is also independent of the
test method, the initial guess of the population parameters, and
the sample size. The MuSig software (Neyer
1994m) uses this analysis method.

M±kS Analysis

Most recently Thompson [1996]
has proposed a method of analyzing tests to determine all-fire
levels by a M±kS type analysis on the Bruceton test. He
has derived a set of k-factors for the common all-fire
levels and confidences for various sample sizes. The procedure
for generating these k factors is straight forward. A test
is chosen with the design, sample size, and initial guess of
parameters fixed. A random number generator is used to supply a
set of thresholds distributed according to some set of population
parameters, m andsand . A simulated test is performed and
estimates of the mean, M, and the square root of the
standard deviation, S, are computed. New sets of threshold
values are chosen and the simulation is repeated a large number
of times. A value of k is then chosen such that 95% of the
estimates computed from M+kS are less than 99.9% value of
the population. The preceding discussion describes how to compute
the 95% confidence 99.9% level. Of course other levels could be
calculated for other probability or confidence values.

The various k factors can be made arbitrarily accurate
by performing a sufficient number of repetitions of the
simulation. Thus this method can give completely unbiased
confidence levels. None of the analysis methods mentioned in the
previous section is able to produce unbiased confidence levels.
While the Likelihood Ratio Test method gives confidence levels
that are close to the requested value for sample sizes of 25 or
greater, the asymptotic analysis methods have been shown to
require sample sizes larger than 100 (Neyer
1994, 1994h, 1996).

Because the M+kS analysis method is the only known
unbiased analysis method, and it is easy to apply, one might
think it is the best method. However, there are several
difficulties with this method. As the following sections will
show, the k factors not only are a factor of probability,
confidence, and sample size, but are also a function of the test
method used and the relationship between the parameters of the
population and assumed parameters when conducting the test.
Because the user does not know the parameters of the population
(that is why the test is performed) simulation must be performed
with a wide range of assumed parameters. If the resulting k
factors are a strong function of the parameters of the
population, the utility of this test method is greatly
diminished. Furthermore, when performing sensitivity tests it is
quite possible to arrive at the result S=0, especially
when performing a Bruceton test (Neyer
1994). If these zero values are included, then many high
confidence values would require an infinite k factor.

This paper reports similar analysis on the Neyer D-Optimal
and the Langlie tests. It also repeats the analysis for the
Bruceton tests. This effort shows that the M+kS analysis
calculates reliable confidence values; as expected the k-factors
are strongly dependent on the all-fire level, confidence level,
and sample size. In addition, the k-factor is also
strongly dependent on the test method and ratio of the population
standard deviation to the estimate used to perform the test for
both the Bruceton and Langlie tests, but is constant for the
Neyer test. This report also compares the variation of the M+kS
all-fire levels obtained with each test method. The data show
that there is much less variation for all-fire levels obtained
with any of the Neyer designs than for the similar Bruceton and
Langlie methods.

Simulation Design

The simulation was performed by picking a set of threshold
values from a normal population, and feeding these values to each
of the three test methods. Commercial versions of software were
used for each of the three tests (Neyer
1994b, 1994l, 1994o).

Each test method requires initial guesses for the parameters
of the population so that the test will be conducted most
efficiently. The Bruceton test requires an initial test level and
a step size. Dixon and Mood [1948]
and simulation (Edelman and
Prairie 1966, Neyer 1989, 1994) show that picking the mean as the
starting point and the square root of the standard deviation as
the step size gives the best results. The Langlie test requires a
lower and an upper test level. Langlie [1965]
suggests the optimal results are obtained if these are chosen at m+4s. Simulation
(Edelman and Prairie 1966,
Neyer 1989, 1994)
has confirmed these results. The Neyer test requires three
values, lower and upper limits for the mean value and a guess for
the standard deviation. The optimal results are obtained when the
limits are set at m+4s, and the standard deviation guess is set
to the true standard deviation.

Previous simulation (Edelman
and Prairie 1966, Neyer 1989, 1991, 1994)
has shown that the behavior of the test method depends critically
on the relationship between the initial guesses of the required
input and the population parameters. The dependence on the ratio
of the square root of the guessed standard deviation to true
value (sg/s ratio) is especially strong. For this
simulation each test was optimized for a population with a mean
of 1000 and a standard deviation of 100. Thus the Bruceton tests
had test parameters of 1000 and 100, the Langlie test had test
parameters of 600 and 1400, and the Neyer test had test
parameters of 600, 1400, and 100.

To ensure that the simulation accurately reflects how
sensitivity tests are actually used, it is necessary to perform
simulations with the same sg/s ratio as experimenters use in the field.
Many labs claim that they can guess sg
to within a factor of two of the true value. However, by
observing the results of tests performed around the world, I do
not believe this to be true. Over the past ten years I have
looked at sensitivity test data from scores of governmental and
industrial laboratories around the world. In the great majority
of cases I believe that the sg/s ratio differs from 1 by more than a
factor of two!

In almost every case the test parameters are chosen to be
"nice numbers." The step size is 20 or 50, almost never
a number like 36.5. Since there is a factor of 2 to 2.5 between
nice numbers, the experimenter will be wrong by at least 50% on
average just due to rounding errors. Observing the ratio between
the s estimate from the data and the sg implied by the test design
leads me to believe that the test parameters were not optimized
or that the initial guess of sg
was wrong by a large factor. Implied sg/s ratios of more than 5 are not at all
uncommon in the data sets that I have observed. Thus, it is
imperative that simulation be carried out for a large range of sg/s
ratios. For this study three test sets were used: the sg/s
ratios were 0.5, 1.0, and 2.0. Although these ratios do not span
the ratios that are found in the typical laboratory, they are
wide enough to illustrate the performance of M+kS
analysis.

The simulation was driven by software developed for NSWC
Dahlgren. Each test method was optimized for a mean of 1000 and a
standard deviation of 100. A set of random thresholds was created
with the mean uniformly distributed over the range m±s. The sg/s ratios were either 0.5, 1.0, or 2.0. Ten
thousand samples of size 50 were chosen. The simulation with
sample sizes smaller than 50 used the first numbers from the set.
Thus, the results for a sample size of 40 represent the effects
of adding 5 additional test samples to the original 35. The
simulation was repeated later with sample sizes of 500. Because
the random number sequence repeats only after 231» 2,000,000,000 iterations, the samples can
be considered independent.

The simulation software used commercially available software (Neyer 1994m) to analyze the test sets
and stored the individual M and S values and the
population parameters in a data file. All tests were analyzed by
the same analysis method, finding the maximum likelihood
estimates of the parameters.. Many experimenters still use the
original Dixon and Mood [1948]
prescription of simple sums for analyzing Bruceton tests. Where
the simple sum rule is valid, both analysis methods yield the
same results. Because the simple sum rule is an approximation of
the maximum likelihood method, the more general maximum
likelihood method should give superior results.

The data file was subsequently analyzed to determine the k
factor necessary to produce a M+kS that was larger than
the required percentile at the required confidence. Various
values of percentile and confidence were analyzed.

To obtain confidence values for various percentiles and
confidence levels the user would follow the following
prescription:

Analyze the data and arrive at estimates for the mean, M,
and the square root of the standard deviation, S.

Find the S bias, B, by reading the value
from the bias graph for the appropriate test, test
conditions, and sample size.

Find the k factor by reading the value from the
bias graph for the appropriate test, test conditions, and
sample size.

To find the 99.9% probability value at 95% confidence,
construct the value

The 3.09 value is the 99.9% percentile of the normal
distribution. If other probability levels or other confidence
levels are needed, the user would have to consult similar graphs
for those values.

Results

One problem with the analysis is that the Bruceton test has a
large number of test results with a zero estimate of the standard
deviation. There are two ways to account for these cases. One is
to include the zero S results. Because no possible k
factor can make these M+kS values larger than the 99.9%
percentile, these cases limit the maximum confidence that can be
achieved. Alternatively, these cases can be ignored. This
approach has been adopted by Thompson [1996].
Throwing out such cases is unrealistic. Very few experimenters
throw out the results of a large test when they get an S
value of zero. Instead, after a number of shots with no overlap,
they typically change the step size and continue the test.
Because there is no well defined rule for when to change step
sizes, it is impossible to conduct a simulation that matches the
way a "typical" experimenter conducts the test. It also
introduces bias, especially when comparing the results of
different test methods that have a much different probability of
achieving a zero S. The Bruceton test has the highest
percentage of degenerate cases, especially when the sg/s
ratio is larger than 1.

Thus, this paper analyzes the data two different ways. The
data reported in this section were first analyzed keeping all
cases where S > 0. Figure 1 through
Figure 12 show the k factor,
variation of the 99.9% level at 95% confidence, and bias as
computed by the M+kS method. Also shown are the fraction
of simulation cases where S = 0. As Figure
4 clearly shows, the Bruceton test has a much higher
probability of yielding degenerate tests. The degenerate results
are completely eliminated from the rest of the analysis. In the
real world, the experimenter can not just forget those cases
where S = 0. Instead the test must continue, possibly with
a different step size in the case of the Bruceton test, until
overlap occurs and the test yields a non-degenerate result.

There are three different graphs on each chart. One graph
shows the results when the real population was drawn from a
population with a Sigma of 50, one where Sigma was 100, and one
where Sigma was 200. From the figures it is readily apparent that
the sg/s
ratio has a major effect on the k factor, percentile variation,
Sigma bias, and percentage of degenerate tests. The only
exception is for the Neyer D-Optimal tests. The Neyer D-Optimal
test results are essentially independent of the sg/s
ratio. As mentioned previously, by analyzing the results of many
different threshold tests conducted at many laboratories across
the country errors in the sg/s ratio of more than two are extremely
common. Thus, great care must be used if this method is to be
used to compute accurate all-fire levels for either the Bruceton
or Langlie tests.

All the graphs should approach their limiting values of 0 for
the k factor, variation of the 99.9% level, and degenerate
cases and 1 for the bias as 1/N, where N is the sample size. The
graphs all have a bottom scale of 1/(N-8). The "8" is
used so that the graphs are closer to straight lines. (See the
figures for the Neyer tests.) It represents the loss of
information that is present when conducting any type of
sensitivity test.

In addition to the individual data points on each graph, there
is also a fit to a straight line. Inspection of the graphs
indicates that only the Neyer D-Optimal test has a reasonable fit
to the straight line functions for smaller sample sizes. The
Langlie test curves could all be fit to straight lines, but none
of them have the correct asymptotic properties. The Langlie test
results do not have the "right" asymptotic properties
because the test concentrates the test points too close to the
mean as the sample size increases.

Truncated Results

The previous section showed the results when all
non-degenerate tests were included. Because there was such a
difference in the percentage of degenerate tests for the
different test methods and sg/s ratios, it is essentially impossible to
compare the results of different tests.

Both the Langlie and Neyer D-Optimal tests have almost no
degenerate tests for reasonable sample sizes because the test
levels can get arbitrarily close to each other. The Bruceton
test, however, has test levels that are always a fixed distance
apart. Thus, if the spread of the population is much smaller than
expected, the Bruceton test will yield a degenerate result, while
the Langlie and Neyer D-Optimal tests will yield small values for
S. The Bruceton test thus has zero probability of
producing a standard deviation substantially smaller than the
initial guess; instead it yields S = 0. Keeping the small
values of S for the Langlie and Neyer D-Optimal tests in
these cases forces the k factors to be larger. It also
increases the variation in probability levels and the Sigma bias.
Thus, all test comparisons are invalid.

However, if instead of just removing the cases where S = 0,
all tests where S £
C, where C is a cutoff value, were removed, then it
would be possible to compare the results of the different tests.
The analysis in the previous section was repeated, but all cases
where S £ C were
eliminated. Various values of C were studied; the case
where C = 40 yielded somewhat similar numbers of ignored
cases for all three test methods.

Ignoring all tests where S > C does not change the
results appreciably for the Bruceton tests, has a minor change on
the Langlie tests, and has a major effect on the Neyer D-Optimal
tests. The curves for all three tests look somewhat the same,
especially the Bruceton and Neyer D-Optimal. Thus, it appears
that the Bruceton test results from the previous section were so
much different precisely because so many of the tests were
ignored in the analysis.

Because the k factors are so strongly dependent on the sg/s
ratio, it would be impossible to use these tables to determine
confidence levels in a practical sense. Rarely does an
experimenter know s before beginning
the test. Moreover, even after completion of the test, knowledge
of s is limited. For example, even for
sample sizes as large as 50, 5% of the tests yield an estimate S
that is at least 40% larger than the correct s
and another 5% yield an estimate that is at least 40% smaller. Figure 1 shows that the k factors differ
by a factor of two for factors of two error in the sg/s
ratio. Thus, it is impossible, even for sample sizes as large as
50, to determine a reliable k factor to use in the
analysis. Because it is impossible to determine which k
factor to use, it is impossible to use the M+kS analysis
method to provide reliable analysis. The only exception to this
statement is for analysis of the Neyer D-Optimal or other similar
tests. Because the adaptive tests are relatively independent to
initial parameter guess, (see Figure 9) there
is almost no error is picking the correct k factor. Thus,
the curve in Figure 9 and similar curves for
other probability levels and confidence values could be used to
provide reasonable analysis for the Neyer D-Optimal and similar
tests.

Summary

Using m+ks
Analysis to arrive at reliable confidence levels for threshold
tests requires accurate knowledge of the ratio between the square
root of the standard deviation of the population s, and the guess used when conducting the
test, sg. Because the
experimenter rarely knows s before
testing has begun, and can not determine s
to much accuracy after completion of a reasonably large sample
size, the m+ks Analysis can not be used to provide
reliable analysis. The Likelihood Ratio Test (Neyer 1991, 1992,
1994a) does not suffer from the
inherent limitations in the m+ks Analysis method. It is able to analyze the
results of all sensitivity tests, even those where S = 0,
and is relatively independent of the test method used and to the s/sg ratio.