This article describes a decision analysis and cost-effectiveness study of screening for mild hypothyroidism. In the current climate of managing costs and managed care, cost-effectiveness studies are increasingly important. Rather than summarizing the paper in great detail, I will try to lay out the basic concepts that underlie it and thus, hopefully, make it easier to read the original study.

Screening for mild thyroid failure at the periodic health examination. A decision and cost-effectiveness analysis

Summary

Background

Thyroid failure is a relatively common problem. In its mildest form, it consists of an elevation of TSH without a significant decrease in circulating thyroid hormone. There are three main arguments in favor of screening for and treating this disorder: averting the progression to overt hypothyroidism, alleviating the subtle symptoms that sometimes accompany subclinical hypothyroidism, and improving serum lipid levels. Since measuring serum TSH in otherwise apparently healthy adults on a regular basis involves a substantial cost, this analysis was performed in order to get a better handle on the actual costs and benefits involved.

Basic approach

In order to perform the analysis, the first step is to construct a simplified model of the situation we are analyzing. In the case described here, two models need to be constructed: a model of health outcomes and costs if we don't screen for subclinical hypothyroidism and a model if we do screen. These models are then implemented and "run" on a computer. For each scenario, the simulation should yield an estimated cost as well as some measure of health outcomes (deaths, years of life, quality-adjusted years of life). Then, when we compare the strategies, we should be able to make a statement of the sort: for X $ spent on screening, we can save Y lives or gain Z quality-adjusted years of life.

Since any model of a complex situation requires multiple assumptions which may or may not be correct, we need to see how sensitive the model is to our various assumptions. Thus, once the original model (or base-case analysis) is "run", we need to re-run it, varying the original assumptions to see how they would affect our results (a sensitivity analysis).

In the analysis presented here, the modelling is accomplished using a Markov decision tree. Markov analysis is a way of analyzing complex systems. It assumes that the system can be described by a number of variables that can each be in one of several "states", and that these variables make transitions from one state to another at discrete time intervals and with probabilities that depend on the values assumed by other variables. When analyzing clinical situations, the appropriate selection of variables and the determination of probabilities governing the transitions from one state to another (such as from the "euthyroid" state to the "mild hypothyroidism" state) are key issues.

Another aspect of the modelling is the use of QALYs or Quality Adjusted Life Years. A QALY is year of life experienced by a patient that is "adjusted" for quality of life by a factor that ranges from 1.0 (no decrease in quality of life) to 0.0 (death). For example, if a patient is followed from age 35 to age 75 without any intervening illnesses, that patient will have generated 75-35 = 40 QALYs. If a patient is followed from age 35 to age 75 but had a stroke at age 65 that reduces the quality of life to 0.6, that patient will generate (65-35) + (75-65)*0.6 = 30 + 10*0.6 = 30 + 6 = 36 QALYs. Obviously, describing an individual's quality of life by a number between 0 and 1 is absurd. It is an absurd but necessary fiction for this sort of analysis.

The model

A number of properties and assumptions underlie the model developed by the authors of this paper. Some of the principle ones include:

Screening vs. no screening strategies:

The screening strategy involves measuring serum TSH along with serum cholesterol every 5 years, starting at age 35. Patients with elevated TSH have measurement of anti-thyroid antibodies, free T4 and repeat TSH and cholesterol determination. They are only treated with thyroid hormone replacement if they have abnormal thyroid hormone levels, elevated antibody titers, elevated cholesterol or symptoms possibly related to mild hypothyroidism. Those with isolated TSH elevation, without any of the other lab abnormalities or symptoms, are followed yearly with the same labwork.

According to the no-screening strategy, patients have cholesterol determinations every five years, and are treated with diet/exercise/drugs and more frequent labwork if they are hypercholesterolemic. If they present with symptoms suggestive of hypothyroidism, they are screened and treated as necessary.

The model assumes that screening patients every five years will avert all cases of overt hypothyroidism. Of the patients who are not screened, a certain percentage will develop overt hypothyroidism, and of these, a very small number will require hospitalization or even die from complicated myxedema.

The heart of the "numbers" that underlie the model are given in Table 1 of this article. Here, the main numerical assumptions that were used are detailed. For each assumption, the base-case estimate is given, along with estimates of the probable range or boundaries for use in the sensitivity analysis. The literature or other data source for the estimates is also given. For example, the probability that overt hypothyroidism will present as complicated myxedema is given as 0.003 in the base-case estimate, 0.01 for "biased toward screening" and 0.0 for "biased against screening". In Table 1 of the article, numbers are given for the initial probabilities of disease states (hypothyroidism and hypercholesterolemia), the probabilities of annual transition from one state to another, the cost of diagnostic tests, the efficacy and cost of treatments, the annual cost of various disease states and the adjustements to be used for determining the QALYs.

Results

Base-case analysis: Under the base-case assumptions, screening for TSH yields an average increase in QALYs for women of 6 days, for men the increase is only 2 days. The average cost was $147 per woman and $120 per man. This yields a cost-effectiveness of $9 223 per QALY for women and $22 595 per QALY for men. For women, approximately 52% of the gain in QALY's came from the prevention of overt hypothyroidism and about 30% from the relief of symptoms of mild hypothyroidism.

Sensitivity analysis: Cost-effectiveness was better for women than for men at all ages, and was more favorable when screening was initiated later in life than at 35 years of age. When the analysis was "re-run", using the upper and lower range values for the various numerical estimates, a number of them were found to substantially alter the results. The two most influential were the cost of the TSH assay and the quality of life adjustment factor for symptomatic mild hypothyroidism. The base-case estimate for TSH assay cost was $25; when that cost was reduced to $10, the price per QALY gained dropped from $9 223 to $3 947; when the assay cost was raised to $50, the price per QALY gained rose to $17 998. Similarly, if patients attached no decrease in quality of life to mild hypothyroidism with symptoms, the price per QALY gained rose to $16 885. The effect of varying other assumptions is given in the article, in figure 5.

Authors' Discussion

The authors feel that, based on these results, periodic screening for mild hypothyroidism with TSH measurement every 5 years after age 35 is cost-effective. They note that the cost per QALY compares reasonably to that for mammography and estrogen replacement therapy in women and for hypertension screening in men. The increase in QALYs afforded by screening is dependent on the prevalence and importance of mild symptoms as well as on the rate of progression to overt hypothyroidism. The costs of screening are offset significantly by the lower cost of cholesterol reduction with thyroid hormone compared to other lipid-lowering drugs. Screening at intervals shorter than 5 years increases the cost of this strategy significantly and is not advised by the authors.

Comment

I have two minor criticisms of the model as presented here. First of all, the authors assumed that screening with TSH will avert all cases of overt hypothyroidism. They did not state explicitly where this assumption comes from, and they did not include this in their sensitivity analysis (what would the results have been if, despite screening, 10% of patients developed overt hypothyroidism?). Second, it is not clear from the model whether or not patients who were treated with thyroid replacement would also have been treated with conventional lipid-lowering drugs if the results were not adequate.

These caveats aside, this study has the potential to significantly influence day-to-day medical practice, since a recommendation to screen all adults over the age of 35 with a serum TSH determination obviously involves a lot of patients. Since the authors performed a quantitative analysis and then compared the results to well-established and accepted procedures (mammography, hypertension screening), their conclusions are more likely to be accepted than if they were merely promulgated ad-hoc.

Despite the strengths of this study, it is important to note that these numbers were quite sensitive to several variables. Also, the model is a simplification and is obviously imperfect. For these reasons, the actual dollar numbers obtained here for cost-effectiveness cannot be taken as gospel truth. Nevertheless, I believe that the numbers generated should be considered valid as ballpark figures and should be useful as such.

Apart from generating cost-effectiveness numbers, the effort at undertaking such a study (and the effort involved in reading it!) produces other benefits. The sensitivity analysis yields much information about the process being analyzed. What factors are critical to improving cost-effectiveness and what factors are marginal can only be determined by this sort of sensitivity analysis. Furthermore, developing and understanding the model that was used leads to a better understanding of the concepts involved in analysing cost-effectiveness in general. Anyone who is called upon to help decide whether or not a given screening strategy is appropriate will be better equipped to ask the right questions after having worked through a paper such as this one.

August 3, 1996

Reader comments

I apologize for the delay in responding to your comments about our
article. As this is a journal club, I will try to respond informally and
not "defensively." Please realize that this is my response and that the
other authors may not agree completely.

Your first point is true. Not all screened patients will be prevented
from developing overt hypothyroidism. However, the ones missed at
screening would probably present at the same time as they would have if
they were never screened. Hence, the arithmetic difference in quality of
life for these patients would be zero. But this also brings up another
point: we did not consider compliance in our model since we assumed that it
was 100% for all therapies. Non-compliance could lead to the development
of symptoms of thyroid hormone deficiency. While the majority of patients,
from my understanding, comply with thyroid medication, patients still
suffer from lapses in adherence to the regimen. Hopefully, the annual
visit and TSH test for all patients with mild thyroid failure would uncover
some of the unseen progression and some of the treatment non-compliance,
reducing the influence of this variable. Since it is not in the model
explicitly, I can not say precisely how important this is in the context of
screening populations. Clearly it is relevant to individual clinicians.

Your second point about whether patients on levothyroxine sodium would be
treated with lipid lowering medications was another wrinkle that we did not
address. Since our estimates of the effectiveness for levothyroxine sodium
and lipid-lowering medications in cholesterol reduction were comparable (as
defined in more detail in the paper), we addressed the choice between the
two and not the combination. Both therapies together may be clinically
appropriate, but to my knowledge there is no data discussing the results
compared to single therapy.

As always, decision models are good for synthesizing the literature and
identifying important aspects of medical decisions. And they suggest a
rational basis for making medical decisions. As you point out, there are
limitations to the level of complexity that can reasonably be accomodated
in the model. It is our hope to continue to expand the model and to
conduct clinical research to better define some of the influential factors
in screening.

As a general practitioner I was quite surprised reading this
paper. I would like to comment on two issues.

First: Talking of screening, we actually mean screening the whole population, and therefore we also must assume false positive and false negative lab results, with their costs included.

Second: In my experience, hypophyseal hormones seem to vary a lot in relation to number of hours slept, time of blood sample taking and, similar to
prolactin, TSH may have a "fragile" blood level depending on
catecholamines and other related hormones.
In daily practice I see patients suffering from high TSH levels,
with mild or no clinical hypothyroidism in whom determinations of
TSH are repeated, and values are back to normal. Can't explain why.

Dr. C. Sitges (General Practitioner)

Nov. 26th. 1997. Barcelona. Spain.

When screening a general population, false negatives and especially false positives become a major problem. The way the testing protocol was set up here, any elevation in TSH was followed by repeat labwork with antithyroid antibodies, free T4 and a repeat TSH. Presumably, most false positive TSH elevations would not remain positive on retesting. Nevertheless, a significant expense is incurred by the measurement of antithyroid antibodies, free T4 and repeat TSH measurement in these patients. I believe this expense was not taken into account in the study, and would indeed affect the results. Similarly, false negatives will also occur with screening tests. In relation to the model presented here, they would cause us to miss some cases of hypothyroidism and thus reduce the efficacy of the intervention, although probably not by very much.

Your second point about the variability in TSH measurements with various conditions implies problems with false positives and false negatives, which then leads to your first point. -- mj

December 1, 1997
Dr. Mark Danese, corresponding author, replies:

The problem with any screening situation is the sensitivity and specificity
of the screening tool. In this particular case, the serum TSH is the gold
standard; hence, sensitivity and specificity calculations are not possible.
This does not mean that there are no false positives or negatives, only that
there is no perfect method of assessment of mild degrees of thyroid gland
failure. This lack of a perfect tool is one of the reasons for the rigorous
follow up testing including repeat TSH, anti-thyroid antibodies, lipids, and
free T4.

We felt that in the absence of a gold standard, repeat TSH testing was
necessary because it made TSH screening less favorable by increasing the
cost of screening. We also assumed that once treated, patients would
continue to be monitored and to remain on thyroxine replacement, also
increasing costs. In the real world, if therapy was not helpful, it would
be discontinued. Also, in a real clinical setting, it might be possible to
reflex test, meaning that the T4 and antibodies would be run only if the
repeat TSH was also high. Both of these aspects of screening, if
implemented, would reduce the cost of testing. However, for publication
purposes, we elected to bias our results against screening if there was any
doubt about an approach.

In short, there are many ways of testing for mild thyroid failure, different
from the algorithm we presented. We have heard many suggestions in the time
since our article was published. None that we have tried was superior to
the algorithm we proposed, but most were not dramatically different either.
It is important to remember that the objective of our paper was not to
optimize the screening procedure; rather, it was to determine if screening
made sense.

The next step in the evolution of TSH screening is to see if we can improve
the identification of patients with mild thyroid failure. Based on the data
from our model, at least 25% of patients with elevated TSH levels may not
benefit from treatment. If we can understand why, or if we can identify
those who don't benefit, this could make screening even more favorable than
it already is.