Reliability growth plots have a variety of names known as:
Duane plots, Crow plots, Crow AMSAA plots, Crow-AMSAA plots, Crow/AMSAA plots,
C/A plots, and C-A plots.They are
log-log plots showing reliability trends of improvement, deterioration, or
no-change (no improvement or deterioration). The most common plot is cumulative
failures versus cumulative time.Often
the Y-axis is transformed to plot cumulative mean time versus cumulative time
which makes it easy to interpret—when the line slope is upward and to the
right, reliability is improving; likewise when it is trending downward and to
the right, reliability is deteriorating.

The plots are “show
me, don’t tell me” how failures are occurring with time.You can use your maintenance data records to
forecast future failures.Also you can
see the results of improvement programs and easily calculate the changes from
the straight lines and the cusps produced by improvement programs.

Reliability
growth plots showing how reliability changes over time with simple graphics
plotted in a log-log format.Fortunately, the trend lines often have straight line segments, and this
makes predictions of future failures a simple matter.See Figure 1 for an example of a simple plot
of cumulative failures versus cumulative time made using WinSMITH
Visual software.

In Figure 1, the literal value of beta>1 may mean
failures are increasing or it may also mean that for practical purposes,
the system shows no improvement or no deterioration.Crow/AMSAA plots with their key indicator
slopes function like yardsticks/meter sticks rather than as a micrometer.

Your explanations are never going to be simpler than
cumulative failures versus cumulative time shown in Figure 1.The straight trend line offers a methodology
for making fearless forecasts of future failures even when your data contains
mixed failure modes.

In “real life” things change from improvements or
deteriorations which are made to the system.We need artifacts and analogs to show us what’s happening in some simple
manner.For example, we need
thermometers to indicate rise/fall in temperature.We need scales/balances to show changes in
physical mass.We need relationships
that provide an analog of physical and human experiences.Thus we the need a reliability growth plot to
give us clues about changes in failure rates.Consequently we can use the reliability growth tool to make “fearless
forecasts” about when future failures will occur on the cumulative failure
versus cumulative time plot by simply extrapolating the trend line into the
future.As reliability engineers, our
task is to put a cusp on the trend line by making cost effective improvements
so the cumulative failure versus cumulative time trend line has a flatter slope
(i.e., beta is less than 1).Thus
reliability growth plots are helpful for reflecting the changes in failure
modes by “digesting” the data from mixed failure modes.

Frequently, reliability changes usually occur in steps.Reliability improvements involve elapsed time
and failures.Longer elapsed times
between failures results in reliability improvements and components and/or the
system will then displays more reliability.When cumulative time (plotted on the X-axis) and cumulative failures
(plotted on the Y-axis) are plotted on uniformly divided graphs they provide us
an analogy of physical experiences as a curved plot.

Experience shows conversion of curved analog plots of
improvement efforts can often be transformed into straight line analog plots by
use of simple two-axis logarithmic plots.With the log-log plot, reliability improvements can often be observed as
a straight line—when engineers have a X-Y plot with a
straight line, they have a fundamental grasp of what’s happening in the real
world and can explain the phenomena.

The task of most reliability engineers is to force cusps (a
break in the straight line trend line of cumulative failures on the Y-axis
versus cumulative time on the X-axis) on the reliability growth lines so that
longer intervals of time occur between failures.Reliability improvement efforts should occur
until the cost of making improvements is no longer justified or until
objectives of the client have been reached.

Why do Crow/AMSAA plots produce straight lines on log-log
plots when cumulative failures are plotted versus cumulative time?The forerunner of the concept
has parallel roots in manufacturing and has been exhaustedly demonstrated as
true log-log phenomena.It’s a natural
occurrence of learning/improving.Consider the following parallel.

T. P. Wright (1936)
pioneered an idea that improvements in the time to manufacture an airplane
could be described mathematically--a very helpful concept for management
production planning.Wright’s findings
showed that, as the quantity of airplanes were produced in sequence, the direct
labor input per airplane decreased in a mathematical pattern that forms a
straight line when plotted on log-log paper.If the rate of improvement is 20% (the learning percentage is 80%) and
thus when large processes and complicated operations production quantity is
doubled, the time required for completing the effort is 20% less. Thus a unit
of production will decrease by a constant percentage each time the production
quantity is doubled.

Wright’s method in the 1940’s was a helpful concept for the USA War
Production Board in estimating the number of airplanes that can be produced for
a given complement of men and machines.After the end of World War II, the US Government employed the Stanford
Research Institute (SRI) to validate improvement curve concepts.SRI studied all USA airframe WWII production
data (see table at bottom of this page) to validate the
concept and SRI developed a slightly different version than the simple case
offered by Wright (DOD 2003) which also plotted on a log-log plot as a straight
line.

Today Wright’s
log-log concept is known as learning curves, cost improvement curves, progress
function, Crawford curves (J. R. Crawford was on the SRI validation
team—Crawford’s model is considered less technical than Wright’s model), Boeing
curves, Northrop curves and so forth to represent the findings of each
manufacturer of airframes who each developed a variation on T. P. Wright’s
simple equation.

The simple
improvement curve was Y =AXB which will produce a straight line on
log-log paper where Y is the unit cost (hours/unit or $’s/unit), X is the unit
number, A is a theoretical cost of the first unit (hours or $’s) and B is a line
slope constant that is related to the rate of improvement [B is literally equal
to ln(learning percent)/ln(2) where the learning percent = 100-(rate of
improvement)].For example if the first
unit took 100 hours to complete (A=110) and if we had an improvement rate of
20% the learning percentage would be 80%, so that B = ln(1.00-0.20)/ln(2) and
B= -0.32193.Thus we would expect
production of the 2nd item would require 80 hours and the 4th
item produced would require 64 hours, and so forth, as the production quantity
doubles we shave 20% from the production time.Some typical learning curve slopes are described at the NASA Cost
Estimating Website (NASA 2003) and the learning % varies from a low of 96% for
raw materials to a high of 75% for repetitive electrical operations with most
values around 80-90%.The plots have
three different formats: 1) hours/unit or $/unit versus cumulative production,
2) cumulative (hours or $’s) versus cumulative production, or 3) cumulative
average (hours or $’s) versus cumulative production.

Learning curves
were used extensively by General Electric, and a GE reliability engineer made
log-log plots of cumulative MTBF versus cumulative time which gave a straight
line for reliability issues (Duane 1964).Duane argued that all failure data should be used on complex
electromechanical systems.He
recommended the Y-axis should be Y = (cumulative failures)/(cumulative time) =
KT-a where the value K is a constant which is dependent upon
equipment complexity, design margins, and design objectives for reliability,
the value for a» 0.5 with the expectations that some designs would be better (meaning a > 0.5) and some would be less (meaning a < 0.5) and T is cumulative time.Duane drew his conclusions from studying 5 different data sets and found
remarkable similarly in patterns for the curves (meaning the line slopes were
about the same).Duane also rearranged
his equations and showed cumulative failures F = KT(1-a) which allowed forecasting of future failures based on past
results.James Duane had a deterministic
postulate for monitoring failures and failure rates of a complex system over
time using a log-log plot with straight lines.

At the US Army
Material Systems Analysis Activity during the mid 1970’s
Larry Crow converted Duane’s postulate into a mathematical and statistical
proof via Weibull statistics in MIL-HDBK-189 (DOD 1981). The military
handbook addressed:

reliability growth-The positive improvement in a reliability parameter over a period of
time due to changes in
product design or the manufacturing process., and
reliability growth management-The
systematic planning for reliability achievement as a function of time and other
resources, and controlling the ongoing rate of achievement by reallocation of
resources based on comparisons
between planned and assessed reliability values.

The ultimate goal
of the improvement program was to make reliability grow so as to meet the
system reliability and performance requirements by managing the development
program.The management effort required
making reliability: 1) visible, and
2) a manageable characteristic.Reliability growth program required goals and
forecast of progress.The failure data
usually produced straight line segments on log-log plots with N(t) = ltb where N is the expected number
of failures, l is the failure rate at time t = 1, t is cumulative time,
and b is the line slope for cumulative failures versus
cumulative time (and b = 1 - a from Duane’s equation).Scientific principles determine that failure
data fit N(t) = ltb and thus failure
data trends can produce a straight line on log-log paper.

Data from
maintenance failure databases on a log-log plot, will build a Crow/AMSAA
relationship for finding the Y-axis intercept at t=1 for l and the slope of the line will define b changes in
the programs.Thus future failures can
be forecasted and cusps on the data trends will tell if the system is improving
(failures are coming more slowly, b<1),
deteriorating (failures are coming more quickly, b>1), or
if the system is without improvement/deterioration (failures rates are
unchanged, b»1).

Recently AMSAA has
updated the information from Military Handbook MIL-HDBK-189 and produced the
AMSAA Reliability Growth Guide TR-652 (DOD 2000).

Two excellent documents on the subject of reliability growth
are:

MIL-HDBK-189, Reliability Growth Management, 13 February 1981Download from ASSIST Quick Search as a PDF file using
the title for the search.

This PDF is 8.1 Meg
in size.
The AMSAA Technical Report No. TR-652
is called the AMSAA Reliability Growth
Guide.You can download this
September 2000 document as a PDF from
this site:

This publication is not listed at http://www.ntis.gov although space exists in
the reference to HDBK-A-1 documentation for inclusion of TR-652.Here is the abstract
for TR-652:

Reliability
growth is the improvement in a reliability parameter over a period of time due
to changes in product design or the manufacturing process.It occurs by surfacing failure modes and
implementing effective corrective actions.Reliability growth management is the systematic planning for reliability
achievement as a function of time and other resources, and controlling the
ongoing rate of achievement by reallocation of these resources based on
comparisons between planned and assessed reliability values.To help manage these reliability activities
throughout the development life cycle, AMSAA has developed reliability growth
methodology for all phases of the process, from planning to tracking to
projection.The report presents this
methodology and associated reliability growth concepts.

Both MIL-HDBK-189 and TR-652 are methodologies and concepts
to assist in reliability growth planning.They provide a structured approach for reliability growth
assessments.In general, they are
considered from the standpoint that you must begin with some new components and
grow the reliability of a system with a development program.

Another source of reliability growth information is IEC 61164 (this document
was previously numbered as IEC 1164) Reliability
Growth-Statistical Test and Estimation Methods.The IEC-61164 document is a
product of TC-56
work group which has provided about 50 documents pertaining to reliability and
dependability.

For plant equipment and operation, the reliability details described
below are a little different:

·The primary purpose of our business activities
is to run our production facilities to make money and not to make an
improvement program

·We have old equipment that can only be improved
at specific time intervals IF the improvement is truly cost effective

·Reliability growth occurs on the device as we
using the equipment for its primary purpose as a link in the money making
machine without time or resources for validating claims for improvements.

·We lack staff and we lack verified knowledge
that our planned improvements will function as forecasted

·We need to forecast when the next failure is
expected so we can plan for replacement/enhancements during schedule
turnarounds

·The reliability improvement process competes for
limited funding with every other program within the production/maintenance
organization

In real production plants the reliability improvement
program is clearly a question of which comes first the chicken or the egg.This requires reliability engineers to have
numbers and then sell the numbers to management is 60 second sound bites based
on 1) Describe the issue and 2) Tell how we will resolve the issue in time and
money.The 60 second sound bites
requires that we have good sales tools and the graphics of Crow/AMSAA plots
help us sell the program.

Several simple
examples of Crow/AMSAA plots-

Consider the following simple discrete examples to illustrate the plotting and calculation
concept.

Example 1:
Suppose we had a system that failed every 60 days for a total of 5
failures.Each corrective maintenance
action was a repair (replacement
components have the same length of life).Following the fifth failure, we added a fix (replacement with a longer life component) with a life of 300
days/failure.Subsequent failures will
also be replaced with longer life components.The data and calculations are shown in Table 1.

Of course in real life, the failures would not occur at the
same time interval.Thus real life
results lack the clarity of Table 1.You
should expect to see much variability in ages to failure as they will occur
with randomness from their family of failure characteristics.

Figure 2 shows a plot of cumulative failures versus
cumulative time.The altitude of this
curve always rises according to the equation N(t) = ltb where N
is cumulative failures, l is the
y-intercept at time = 1, and b is the indicator
of reliability improvements (b<1),
reliability deterioration (b>1), or
no reliability change (b=1).

The simple equation N(t) = ltb can be used to make a “fearless
forecast” of when the next failure will occur (that is failure number 11 for
this case):t = (11/0.1645)(1/0.548)
= 66.8691.8248 = 2141.28 cumulative time.The “fearless forecast” of the next failure
is Dt = 2141.28 – 1800 = 341 days
compared to the 300 days expected from the discrete data in Table 1 (remember
in real life you would not have discrete data!).Crow/AMSAA plots are very useful for
predicting future failures based on your data.The technique provides a methodology, the equations are simple, the
failure forecast is based on your data, and you can make reasonable forecast of
future events.Remember, out task as
reliability engineers is to make improvements so that we do not incur the
predicted future failures!

The
object of our reliability improvement is to find ways to prevent failures.When you know the approximate time for the
next failure (based on the fearless forecast) you need to find ways to prevent
the failure.

The first human reaction to Figure 2 is you cannot forecast
failures.The second human reaction
based on actual experience is wow—this technique really works.The third human reaction is to search for
which item will fail next.Finally the
human reaction is to “get with the improvement program” to prevent failures.

Unfortunately, it takes considerable time for humans to “buy
into” the improvement program because they fail to acknowledge that such a
simple equation can be a reasonably good predictor for single or mixed failure
modes.[As a side note, please recognize
that many equations describing physical phenomena have simplistic equations: F
= ma, E = mC2, S = F/A, etc.Since most of you cannot derive or explain the theory behind these well known equations why would you doubt that N(t) = ltb
also describes important physical relationships in the field of reliability.]

Figure 3 shows the data in Figure 2 transformed by dividing
the cumulative time by the cumulative failures.In Figure 3, notice the clarity of the change in Cum-MTBF from the
earlier plateau.

The altitude of Figure 3 can go up (reliability improves), down
(reliability deteriorates), or sideways (reliability is not changing).Note that Figure 3 still carries the
statistics from Figure 2.The actual
line slope of Figure 3 is usually represented by a
= 1-b.The y-intercept of Figure 3 is 1/l.Also the trend line in Figure 3 can also be
used for making “fearless forecasts” into the future to establish goals for the
cumulative MTBF.

Notice each equation is described by means of the specific
option in WinSMITH Visual software.Which equation you use depends upon your
specific interest and need.I find the
cumulative failure events is most useful for my interest followed by the cumulative
MTBF plots.Of course you should
remember that with my clients, their primary interest is producing a product
for sale and use of these techniques is a secondary interest in predicting the
expected failure rate and making a decision about how to interpret the
statistics.

Other practitioners will have different needs, different
interests, and thus will use different equations.

Consider the three precise trend lines in Figure 4.All three trend lines have the first failure
occurring at the same time.

·The line of no improvement/deterioration
of course carries a beta = 1 for the line slope.

·The second line with beta < 1 shows an
improvement as cumulative time data is stretched to longer time intervals.

·The third line with beta > 1 shows
deterioration and the cumulative time data is compressed to shorter time
intervals.

These
thoughts will be useful for a Monte Carlo model which will produce random times
to failure as it is generally considered easy to create a model for beta =1 but
not so easy to create models for betas different than 1.

Table 3 quantifies the multipliers for the
stretch/compression in cum times.The
key to this method is taking a beta =1 (The use of random numbers for the case
of beta = 1 is easy to produce) and transforming the simple case into other
beta values by

stretching or compressing the
results. The method shown in Table 3 will avoid hooked cumulative curves.

You can see the detailed simulation of the stretch-compress
method and the NIST method by downloading an Excel
spreadsheet Crow/AMSAA simulation which has both methods illustrated.The spreadsheet will allow you to examine a
data set of 10 data points and 100 data points.Clearly the NIST Crow/AMSAA simulation method is easier to use than the
stretch-compression.As you watch the
Crow/AMSAA simulations you will see:

The
rank regression method is highly susceptible to the early failures which
cause the slope to vary substantially from the true (precise) value—this
is one reason some cases in MIL-HDBK-189 drop the first few data points
from the regression analysis—more to follow on this subject at a later
date.

“Unbiased”
MLE methods (using less than say 500 data points) are biased—again, more
to follow on this subject at a later date.

Often
(say 1 time in ~10) the simulation develops cusps on the cumulative trend
line which says in real life cusps need to be evaluated carefully to
distinguish between random events and significant events which should
remind you of the TV advertisement “Is it real or is it Memorex?”.

Usually
the data points are clustered tightly around the trend line so that the
confidence limits will be fairly tight, however, the confidence limits on
the trend line will be very wide as the lines bounce around the true (precise)
value which drives the simulation—again some of this bias can be removed
by eliminating a few of the early data points from the regression so the
line is allowed to gain “mass” so the inertia of the data points provides
stabilization—stay tuned for more details to follow from a “kabillion” Monte Carlo Crow/AMSAA simulations.

This
type of Crow/AMSAA simulation, when automated, will generate the details
needed for confidence intervals and critical correlation coefficients as
published in The New Weibull
Handbook.More information to
follow in subsequent Problems
Of The Month.

WinSMITH
Visual analysis of the data sets in the Excel spreadsheet Crow/AMSAA
simulation match the results obtained by Excel and provide an excellent
Monte Carlo simulation (Version 4.0T and above) which can be observed with
fidelity even on the demonstration
software.

Refer to the caveats on the Problem
Of The Month Page about the limitations
of the solution above. Maybe you have a better idea on how to solve the
problem. Maybe you find where I've screwed-up the solution and you can point
out my errors as you check my calculations. E-mail your comments, criticism,
and corrections to: Paul Barringer by clicking here.Return to the top of this
problem.

Technical tools are only interesting toys for engineers until results are
converted into a business solution involving money and time. Complete your
analysis with a bottom line which converts $'s and time so you have answers
that will interest your management team!

You can download a PDF copy of this Problem Of The
Month by clicking here.