Consider a dual redundant system consisting of two
components, each of which has a failure rate of l. One of the components is checked and (if necessary)
repaired periodically once every t
hours, and the other component is checked and repaired once every mt hours, where m is some positive integer.
The overall system is considered failed if both components are failed
simultaneously, at which point the system is immediately repaired or replaced
with a completely healthy system.

One simple way of evaluating the long term system failure
rate is to approximate the repair transition for each state as continuous with
constant rate equal to the reciprocal of the mean time to repair for that
state. If the rate of entering the state is much less than the repair rate,
then the system will enter the state at times that are almost uniformly
distributed over the inspection interval, so the mean time to repair will be
about half of the inspection interval. However, if the rate of entering the
state is large compared with the repair rate, the mean time to repair will
approach the full inspection interval. For a failure state j that is entered
directly from the full-up state with a rate lj,
the initial probability density for entering the failure state is roughly

where Sl denotes
the sum of the transition rates exiting the full-up state. Hence we can approximate
the mean time of entry for systems that enter state j over an interval t as

and so the mean time to repair for these systems is
approximately

where t is the
inspection (and repair) interval. Naturally this is approximately t/2 for sufficiently small values of t. Using this formula, we can determine suitable
values for the repair rates m1
and m2 of our dual
redundant system, and the system can then be represented by the simple Markov
model shown below.

The steady-state system equations are

Solving these for P1 and P2, and
inserting into the conservation equation P0 + P1 + P2
= 1, we can then solve for P0. From this we can express the system
failure rate as

Does this simple approach accurately reflect the system
response? We have substituted exponentially-distributed continuous repairs
for periodic discrete repairs, so it would be of interest to evaluate the
original system precisely, with no approximations, to allow us to compare the
exact result with this approximate formula. The exact system is as shown in
the figure below.

To solve for the exact long-term system failure rate, we
first consider the time-dependent response of the original system, leaving
out the discrete repairs. The system equations are

In terms of the sum S(t) = P1(t) + P2(t)
and the difference D(t) = P1(t) Ė P2(t), these
equations can be separated and written in the equivalent form

These are the governing equations in between discrete
repairs. Thus during the period following the jth repair of State 1 we have

Our strategy is to begin with Sj(0) = Dj(0)
= 0, and evaluate the results for m consecutive intervals of duration t, and between each interval we will move
all the accumulated probability from State 1 to State 0. The system failure
rate is lS, and the average of the
average rates for all m intervals represents our overall system failure rate.

Noting that P1 = (S+D)/2 and P2 = (S-D)/2, we can say that the values of S and
D at the end of one interval are related to the values at the start of the
next (after P1 has been set to zero while P2 is held
constant) according to the formulas

The left hand relation signifies that Sj(0) = -Dj(0) for all j. Making this
substitution in the right hand relation, and also replacing Dj(t) with Dj(0)e-lt, we get

Now we can substitute for Sj(t) from the previous expression for Sj(t)
at t = t, to give the recurrence
relation for the values of S at the start of each interval

In general, given a linear recurrence of the form sj+1
= Asj + B with the initial condition s0 = 0, itís easy
to show that sj = B(1-Aj)/(1-A). Therefore, we can express the value of S at the start of
the jth interval in closed form as

Furthermore, we know the average rate over the jth
interval is

Carrying out the integration and inserting the previous
expression for Sj(0), we get an explicit expression for the average system
failure rate during the jth interval

The average system failure rate over all m intervals is
1/m times the sum of the individual average rates. Only one of the terms in
the expression for lj
involves j, so the other terms all appear unchanged in the overall failure
rate. For the remaining term, we can sum the geometric series and divide by m
to give the final result

We can compare this with the ordinary Markov model
approximation with continuous repair transitions discussed previously, making
use of the estimated mean times to repair for determining the repair
transition rates. The figure below shows the results for the case m = 10,
i.e., the inspection interval for state 2 is ten times the inspection
interval t for state 1. The left hand
figure shows the system failure rates for the two methods, and the right hand
figure shows the difference between them.

Clearly the agreement is very close for this case, but
interestingly the agreement is not nearly as good if we consider the
symmetrical case, i.e., with m = 1, signifying that both components have the
same inspection interval. A comparison for the two methods in this case is
shown below.

This shows that the overall system failure rate is
appreciably greater with periodic repairs than with continuous repairs with
(approximately) the same mean times to repair. Recall that the continuous
repair model uses, for the jth state, a repair transition rate equal to the
reciprocal of the estimated mean time to repair for a system entering that
state, i.e.,

where t is the
duration of the inspection interval for the jth state and Sl is the sum of the transition rates
exiting the upstream state. In our case we have
Sl = 2l, since there are two
transitions exiting the full-up state, each with rate l. Also, in the symmetrical case m = 1, the inspection
intervals are t for both states.
Therefore, inserting the rate

into the steady-state solution for the symmetrical Markov
model with continuous repairs, we get the approximation

We can compare this with the exact periodic solution,
which in the symmetrical case m = 1 reduces to

This result can also be derived by direct integration (as
shown in another note), and we see
that it does indeed differ from the prediction for continuous repairs.
However, if we took Sl equal to 3l instead of 2l, the Markov model with continuous repairs would give exactly
the same system failure rate as the system with periodic repairs. This
corresponds to increasing slightly the mean time spent by the system in each
of the partial failure states. Nevertheless, the setting Sl equal to 2l
definitely gives a better fit in the asymmetric cases, so no single value is
optimum for all cases. Presumably the system failure rate with periodic
repairs is greater than the rate with continuous repairs (with the same mean
repair times) due to the lack of statistical independence between the two
component failures with synchronized inspections. If the periodic
inspections were staggered, we would expect the results to more closely match
the continuous repair model with Sl =
2l. Still, itís remarkable that, in
the symmetrical case, this non-independence effect in the periodic model is
exactly reproduced by setting Sl = 3l.†