1. Introduction

Over the past forty years, improvements in the sensitivity and specificity of thyroid testing methodologies have dramatically impacted clinical strategies for detecting and treating thyroid disorders. In the 1950s, only one thyroid test was available – an indirect estimate of the serum total (free + protein-bound) thyroxine (T4) concentration, using the protein bound iodine (PBI) technique (1,2). Since 1970, technological advances in radioimmunoassay (RIA) (3-6), immunometric assay (IMA) (7,8) and more recently liquid chromatography-tandem mass spectrometry (LC-MS/MS) methodologies (9-12) have progressively improved the specificity, reproducibility and sensitivity of thyroid testing methods (13,14). Currently, serum-based immunoassays and LC-MS/MS techniques are available for measuring the total and free thyroid hormones, Thyroxine (T4) and Triiodothyronine (T3) concentrations, as well as the pituitary thyroid stimulator, Thyrotropin (Thyroid Stimulating Hormone, TSH) and the thyroid hormone precursor protein, Thyroglobulin (Tg) (12,15-18). In addition, measurements can be made of the thyroid hormone binding proteins, Thyroxine Binding Globulin (TBG), Transthyretin (TTR)/Prealbumin (TBPA) and Albumin (18). The recognition that autoimmunity represents a major cause of thyroid dysfunction has led to the development of tests for the detection of thyroid autoautoantibodies such as thyroid peroxidase antibodies (TPOAb), thyroglobulin antibodies (TgAb) and TSH receptor antibodies (TRAb) (13,19-24). Currently, thyroid tests are performed primarily on serum specimens using manual or automated immunoassay methods employing specific antibody reagents directed at these ligands (13,25) or LC-MS/MS techniques used to measure free hormone moeities (FT4 and FT3) in dialysates and ultrafiltrates (9,10,26-29) or Thyroglobulin after alkylation and trypsinization (12,17). However, sensitivity, specificity and standardization issues still result in substantial between-method variability for many of these tests (8,12,23,28,30-33). To address this issue, new performance standards are being established by the professional organizations as well as technological advancements undertaken by instrument manufacturers (10,13,33-36). This chapter is designed to give an overview of the current status and limitations of the thyroid testing methods most commonly used in clinical practice and as recommended by current guidelines (13,34,37-40).

2. Total Thyroid Hormone Measurements (TT4 and TT3)

Thyroxine (T4) circulates approximately 99.97% bound to the plasma proteins: TBG (60-75%), TTR/TBPA (15 -30%) and Albumin (~10%). In contrast, approximately 99.7% of Triiodothyronine (T3) is protein-bound, primarily to TBG (18,27,41). Total (free + protein-bound) concentrations of the thyroid hormones (TT4 and TT3) circulate at nanomolar concentrations and are considerably easier to measure than the free hormone moieties (FT4 and FT3) that circulate in the picomolar range. The serum TT4 measurement has evolved by the development of a variety of technologies over the past four decades. However, despite changes in methodology from PBI in the 1950s, to competitive protein binding assays in the 1960s, to RIA in the 1970s and now LC-MS/MS methods, it has remained in all its forms a remarkably robust determination (Figure 1) (42-48). It is primarily for this reason that recent studies suggest employing total rather than free T4 as the preferred method for assessing thyroid status in pregnancy and critical illness [Section 3C(ii)] (40,42,49). Recently, primary T4 and T3 calibrator standards have become available, and measurements based on LC-MS/MS have further improved the standardization of these tests (10,28). Most commonly, TT4 and TT3 concentrations are measured by competitive non-isotopic immunoassay methods performed on automated platforms that use enzymes, fluorescence or chemiluminescent molecules as signals (14,25)(27). Total hormone methods require the inclusion of inhibitors, such as 8-anilino-1-napthalene-sulphonic acid, to block hormone binding to serum proteins in order to facilitate hormone binding to the antibody reagent (50-53). Serum TT3 method development has paralleled that of TT4. However, the ten-fold lower TT3 concentration presents a significant sensitivity and precision challenge despite the use of higher specimen volumes (3,54-61). Between-method variability for measuring total hormones (TT4 and TT3) has now been lowered by the use of highly purified preparations of crystalline L-thyroxine and L-triiodothyronine and by establishing reference LC-MS/MS techniques (28,30,61). Currently, observed assay variability most likely relates to matrix differences between calibrators and patient sera and the efficiency of the blocking agent employed by different manufacturers (30,58,62,63).

A. Clinical Utility of TT4 and TT3 Measurements

TThe diagnostic accuracy of total hormone measurements would be proportional to that of free hormone tests if all patients had similar binding protein concentrations (18,27,64). For example, a recent study has reported that a screening cord blood TT4 < 7.6 ug/dL (< 98 nmol/L) may serve as a valid screening test for congenital hypothyroidism (65). Unfortunately, many conditions commonly encountered in clinical practice are associated with TBG abnormalities that distort the proportionality of the total to free thyroid hormone relationship (Table 1). Additionally, some patients have thyroid hormone binding albumins (dysalbuminemias), thyroid hormone autoantibodies, or are taking drugs that render total hormone measurements diagnostically unreliable [Table 1 & Section 3E] (14,66-70). Consequently, TT4 and TT3 measurements are rarely used as stand-alone tests, but are typically employed in conjunction with a direct TBG measurement or an estimate of binding proteins [i.e. a thyroid hormone binding ratio test, THBR, [Section 3Ba(ii)] used to calculate a free hormone index (FT4I or FT3I). This index approach effectively corrects for the most commonly encountered abnormalities in thyroid hormone binding proteins that distort total hormone measurements [Section 3Ba] (42,71,72).

(a) TT4 and TT3 Reference Ranges

Although total T4 reference ranges vary to some extent depending on the methods employed, they have approximated 58 to 160 nmol/L (4.5-12.5 µg/dL) for more than four decades (Figure 1) (42). Currently, there is renewed interest in using TT4 measurements in preference to FT4 estimate tests to monitor pregnancy, when the non-pregnant reference range is adjusted by a factor of 1.5 to compensate for the predictable TBG elevation (40,42,73-77). As with TT4, serum TT3 values are method dependent to some extent and have reference ranges approximating to 1.2 – 2.7 nmol/L (80 –180 ng/dL) (59). In a recent study, simultaneous measurement of TT4 and TT3 in serum using LC/MS/MS reported comparable values to TT4 and TT3 immunoassay (48). LC/MS/MS measurement of TT4 and TT3 has also established values for the three trimesters of pregnancy (78).

3. Free Thyroid Hormone Tests (FT4 and FT3)

The impetus to develop free hormone tests has been the high frequency of binding-protein abnormalities encountered in clinical practice, especially the high TBG state of pregnancy (Table 1). In accord with the free hormone hypothesis, it is believed that the minute free fraction of hormone (0.02% versus 0.2% for FT4 versus FT3, respectively) is responsible for biologic activity at the cellular level (18,64,79). It follows that free hormone measurements will reflect the physiological effects of thyroid hormones better than total hormone concentrations when binding proteins are abnormal (18). Free hormone methods fall into two categories – direct methods that employ a physical separation of the free from protein-bound hormone, and estimate tests that either (a) calculate an free hormone “index” from a TBG estimate and measurement of total hormone or, (b) use an immunoassay that employs an antibody to sequester a small amount of the total hormone purportedly proportional to the free hormone concentration. It is important to recognize that no free hormone test is universally valid in all clinical situations. Index tests (FT4I and FT3I) as well as FT4 and FT3 immunoassays are all protein-dependent to some extent and most are prone to under- or overestimate FT4 in patients with significant abnormalities in thyroid hormone binding proteins (11,26,32,67,80-84). Even direct methods that employ equilibrium dialysis or ultrafiltration [Section 3A] are not immune from technical problems relating to dilution, adsorption, temperature or the influence of endogenous binding protein inhibitors (85-88). Although the diagnostic accuracy of free hormone methods cannot be predicted either from a method’s classification or by an in-vitro test of technical validity such as dilution (27,83,89-91), the superior log/linear TSH/FT4 relationship seen with direct FT4 methods (ultrafiltration followed by LC-MS/MS) highlights the inferiority of current FT4 immunoassays (11,92,93).

A. Direct FT4 and FT3 Methods

Direct free hormone methods employ equilibrium dialysis (10,95-97), ultrafiltration (9,11,86,98-101) or gel filtration (102) to separate the exceeding small amount of free hormone from the dominant protein-bound moiety. Because these techniques are technically demanding, inconvenient and expensive, they are typically only readily available in reference laboratories (27,103). Despite being considered as reference methods, equilibrium dialysis and ultrafiltration techniques can be prone to inaccuracies causing under- or overestimate or FT4 due to binding inhibitors in the specimen, dilution, pH, adsorption errors or temperature sensitivity (32,86-88,104-107). Direct free hormone methods are primarily used to evaluate the thyroid status of unusual patients displaying FT4 immunoassay values that appear discordant with TSH. Direct methods are also used as a reference for assigning calibrator values for FT4 immunoassay tests [discussed in Section 3Bb] (10,108).

(a) Equilibrium Dialysis

Early equilibrium dialysis methods used I-131 and later I-125 labeled T4 tracers to measure the free T4 fraction which, when multiplied by a total hormone measurement, gave an estimate of the free hormone concentration (95). Subsequently, symmetric dialysis, in which serum is dialyzed without dilution or employing a near-physiologic medium was used to overcome dilution effects (85,87,109). By the early 1970s higher affinity T4 antibodies (>1×1011 L/mol) and high specific activity T4-I125 tracers were used to sensitize RIA methods to directly measure FT4 and FT3 in dialyzates and ultrafiltrates (6,32,47,97,98,103,110-112). Subsequent improvements employed more physiologic buffer diluents and an improved dialysis cell design (87,112). Recently, LC/MS/MS has been adopted as the international reference method for the measuring free T4 concentrations of dialysates and ultrafiltrates (10,11,28,107,108).

(b) Ultrafiltration Methods

A number of recent studies have used ultrafiltration to remove protein-bound T4 prior to LC/MS/MS measurement of FT4 in the ultrafiltrate (92,107,113,114). Direct FT4 measurements that employ ultrafiltration are sometimes higher than seen with equilibrium dialysis because ultrafiltration avoids dilution effects (100). Furthermore, ultrafiltration is not influenced by dialyzable inhibitors of T4-protein binding that can be present in conditions such as non-thyroidal illness (NTI) (100). However, ultrafiltration can be prone to error if there is failure to completely exclude protein-bound hormone and/or there is adsorption of hormone onto the filters, glassware and tubing (32,86,101). In addition, ultrafiltration is temperature sensitive such that ultrafiltration performed at ambient temperature (25°C) will report FT4 results that are 67 percent lower than ultrafiltration performed at 37°C (88,113). A good correlation between direct FT4 measurements using equilibrium dialysis versus ultrafiltration prior to LC/MS/MS has been shown (113).

(c) Gel Absorption Methods.

Some early direct FT4 methods used Sephadex LH-20 columns to separate free from bound hormone before eluting the Free T4 from the column for measurement by a sensitive RIA. However, because of a variety of technical issues, assays based on this methodologic approach are now rarely used (27,102).

B. Indirect FT4 and FT3 Estimate Tests

The majority of FT4 and FT3 methods only estimate free hormone. FT4 and FT3 Estimate Tests use either two-test “index” approach or an immunoassay “sequestration” method (9,27,83,103,115,116). Despite manufacturers claims, all current FT4 and FT3 estimate tests are binding-protein dependent to some extent. In fact, a recent study has shown that FT4 immunoassays correlate better with total than free T4 concentrations although this conclusion has been disputed (81,115-117).

(a) Two-Test Index Methods (FT4I and FT3I)

Free hormone index (FT4I and FT3I) methods are typically based on a simple calculation that mathematically corrects the total hormone concentration for the influence of abnormal binding proteins, primarily TBG. These indexes have been used to estimate free hormone concentrations for more than 40 years and require two separate measurements (118,119). The first measurement is the total hormone concentration (TT4 or TT3) [Section 2], the second measurement an assessment of the binding protein concentration using either (i) a direct TBG immunoassay, (ii) a Thyroid Hormone Binding Ratio (THBR) or “Uptake” test or (iii) an isotopic determination of the free hormone fraction (27,80,120,121).

(i) TBG Immunoassays Despite reports that indexes employing THBR in preference to direct TBG appear to be diagnostically superior (122), free hormone indexes calculated using direct TBG measurement (TT4/TBG) may offer improved diagnostic accuracy over the use of THBR when the total hormone concentration is abnormally high (i.e. hyperthyroidism) or when drug therapies interfere with THBR tests (70). However, the TT4/TBG index is not independent of the TBG concentration, nor does it correct for Albumin or Transthyretin binding protein abnormalities (Table 1) (71,72,80,123,124).

(ii) Thyroid Hormone Binding Ratio (THBR) / “Uptake” Tests

The first “T3 uptake” tests developed in the 1950s employed the partitioning of T3-I131 tracer between the plasma proteins in the specimen and an inert scavenger (red cell membranes, talc, charcoal, ion-exchange resin or antibody) (27,125-127). The “uptake” of T3 tracer onto the scavenger provided an indirect, reciprocal estimate of the TBG concentration of the specimen. Initially, T3 uptake tests were reported as percent uptakes (free/total tracer). Typically, sera with normal TBG concentrations had approximately 30 percent of the T3 tracer taken up by the scavenger. During the 1970s methods were refined by replacing I131-T3 tracers by I125-T3, calculating uptakes based on the ratio between absorbent and total minus absorbent counts, and expressing results expressed as a ratio with normal sera having an assigned value of 1.00 (120). Historically, the use of T3 as opposed to T4 tracer was made for practical reasons. These related to the ten-fold lower the affinity of TBG for T3 versus T4 that facilitated a higher percentage of T3 tracer being taken up by the scavenger and allowing lower isotopic counting times. Because current methods use non-isotopic T4 or T3 analogs, counting time is no longer an issue. THBR methods based on T4 binding are probably more appropriate for correcting for T4-binding protein effects, but this issue has not been extensively studied (128). Although THBR tests are all to some degree TBG dependent, they usually produce normal FT4I and FT3I values when TBG abnormalities are mild (i.e. pregnancy and estrogen therapy) (42,80,118,119,129). However, these tests often fail to normalize FT4I and FT3I values when euthyroid patients have grossly abnormal binding proteins such as congenital TBG extremes, Familial Dysalbuminemic Hyperthyroxinemia (FDH), thyroid hormone autoantibodies, non-thyroidal illness (NTI) or medications that directly or indirectly influence thyroid hormone binding to plasma proteins (27,67-70,80,83,129-139).

(iii) Isotopic Index Methods

The first free hormone tests developed in the 1960s were indexes calculated from the product of the free hormone fraction, measured isotopically by dialysis, and TT4 measured by PBI and later RIA (95,121,140). These early isotopic detection systems were technically demanding and included paper chromatography, electrophoresis, magnesium chloride precipitation and column chromatography (95,111,141-143). The free fraction index approach was later extended to ultrafiltration and symmetric dialysis, the latter measuring the rate of transfer of isotopically-labeled hormone across a membrane separating two chambers containing the same undiluted specimen (67,98,100,109,140,144,145). Ultrafiltration and symmetric dialysis had the advantage of eliminating dilution effects that influenced tracer dialysis values (85,87). However, free hormone indexes calculated using an isotopic free fraction are not completely independent of the TBG concentration and furthermore are influenced by tracer purity and the buffer matrix employed (81,112,141,142,146,147).

(iv) The Clinical Utility of Two-Test Index Methods

The two-test FT4I index approach has been preferred to free hormone immunoassay methods for monitoring pregnant patients and hospitalized patients with NTI (13,42,148). This is because the binding protein abnormalities present in pregnancy and NTI can cause method-specific abnormalities in FT4 immunoassay values such that the reference ranges for non-pregnant ambulatory subjects do not apply (42,148-150). In contrast, the non-pregnant ambulatory FT4I reference range applies both to pregnancy and hospitalized patients with NTI (42,76,150,151).

(b) Free Thyroid Hormone Immunoassay Methods (FT4 and FT3)

Currently, most clinical laboratories use automated immunoassays to estimate serum FT4 and FT3 concentrations (25). These methods are calibrated using gravimetric standards or calibrators with values assigned by a direct reference method [Section 3A] (6,10,89). Free hormone immunoassays are based on the principle of using a specific antibody to sequester a small amount of the total hormone such that the fractional occupancy of antibody binding sites is determined by the free hormone concentration (27,152,153). The key to the validity of these methods is to use conditions that maintain the free to protein-bound hormone equilibrium and minimize dilution effects that weaken the influence of any endogenous inhibitors that may be present in the specimen (79,85,87,109,154). The actual proportion of total hormone sequestered (1-3%) varies with the methodologic design but greatly exceeds the actual free hormone concentration (6,155). Unfortunately, current free hormone immunoassays, especially FT3, appear to be more variable and less reliable than total hormone measurements (63,90,156). Considerable confusion still surrounds the nomenclature of FT4 tests and controversy continues regarding the technical validity of the measurements themselves, and their clinical utility in pathophysiologic conditions associated with abnormal binding proteins (Table 1) (6,14,27,83,115,116). Since the original one-step “analog” tests became available in the 1970s, the term “analog” has become mired in confusion (27,83,115,116,155). Despite improvements, current free hormone immunoassays still appear sensitive to alterations in serum albumin and abnormal binding proteins as well as certain drugs, high free fatty acid (FFA) levels or hormone binding inhibitors believed to be present in sera from patients with non-thyroidal illnesses (NTI) (14,67,84,91,115-117,146,157-160). In fact, a recent study has shown that FT4 immunoassay values correlate with both TBG and albumin concentrations and not inversely with TSH resulting in discordances between FT4 measured by immunoassays and direct methods (11,93). In contrast, a strong inverse log/linear TSH/FT4 relationship and no TBG or albumin correlations were seen when using a direct LC/MS/MS method (11). Three general approaches have been used to develop FT4 and FT3 immunoassay methods: (i) two-step labeled hormone; (ii) one-step labeled analog; and (iii) labeled antibody tests (27,83).

(i) Two-Step, Labeled-Hormone/Back-Titration FT4 and FT3 Methods

The two-step approach was first developed by Ekins and colleagues in the late 1970s and is still used for some procedures (27,83,89,91,161-164). Two-step methods typically employ immobilized T4 or T3 antibody (for FT4 and FT3 immunoassays, respectively) to sequester a small proportion of total hormone from a diluted serum specimen without disturbing the original free to protein-bound equilibrium. After removing unbound serum constituents by washing, a labeled probe (125-I T4 or macromolecular T4 conjugate) is added to quantify unoccupied antibody-binding sites that are inversely related to the free hormone concentration (26,83). After washing, the amount of label bound to the solid-phase antibody is quantified (27,83,137,157,165,166).

(ii) One-Step, Labeled Hormone-Analog FT4 and FT3 Methods

Classic “one-step” methods use hormone analogs having molecular structures that claim to be non-reactive with serum proteins but able to compete with free hormone for unoccupied antibody-binding sites (83,89,152). The principle underpinning these methods is that hormone-analogs coupled to an isotopic or non-isotopic signal-generating molecule that competes with free hormone for a limited number of antibody binding sites in a classic competitive immunoassay format. Although conceptually attractive, and despite early claims of success using this approach to measure free hormone independent of binding protein concentrations, it has been difficult to achieve satisfactory results in practice, (115-117,167, 168). In fact, recent studies have shown these analogs clearly have protein-binding capability and are at best free hormone “estimate” tests that offer minimal advantages over total hormone assays (11,169). Initially these methods were engineered to give normal FT4 values in high TBG states (i.e. pregnancy) but later were found to have poor diagnostic accuracy in the presence of abnormal albumin concentrations secondary to Familial Dysalbuminemic Hyperthyroxinemias (FDH), non-thyroidal illnesses (NTI), high free fatty acid concentrations or thyroid hormone autoantibodies (53,67,83,131,137,148,158,163,165,166,168,170-180).

(iii) Labeled Antibody FT4 and FT3 Methods

Labeled antibody methods represent competitive immunoassays that measure free hormone as a function the fractional occupancy of hormone-antibody binding sites using specific immunoabsorbants to quantify the unoccupied antibody binding sites in the reaction mixture (6,27,89,91,156). The physiochemical theory of analog-based labeled antibody methods would suggest that they would be as susceptible to the same errors as the labeled hormone-analog methods described above. However, the physicochemical differences arising from the binding of labeled antibodies to the solid support confer kinetic differences that result in decreased analog affinity for endogenous binding proteins and thus produce more reliable free hormone measurements (156,181). For this reason, the labeled antibody approach has become favored for a number of current automated immunoassay platforms (25,182,183).

C. Clinical Utility of FT4 and FT3 Measurements

Unfortunately, the diagnostic accuracy of a free hormone determination cannot be predicted from its methodologic classification (27,83,90). However, despite these shortcomings, these tests have come into common use because of their diagnostic usefulness.

(i) Ambulatory Patients

Free hormone tests (FT4 or FT3) are most commonly used in preference to total hormone measurements (TT4 or TT3) to improve the diagnostic accuracy for detecting hypo- and hyperthyroidism in patient populations that may include individuals with the thyroid hormone binding abnormalities as shown in Table 1. However, it is current practice to employ TSH as the first-line test in most clinical settings [Section 4e] while relegating FT4 or FT3 measurements to second- or third-line tests (FT4 versus FT3, respectively) to investigate situations in which TSH abnormalities are found (13,37). The exception to this general rule occurs in those select conditions where TSH may be diagnostically unreliable in which case FT4 becomes the first-line test of choice. Such conditions include periods of unstable thyroid status (the early phase of treating hypo- or hyperthyroidism), when hypothalamic-pituitary dysfunction is suspected to be present, or when patients are taking drugs such as glucocorticoids that are known to affect TSH secretion (13,70,184-188).

Discordances between TSH and FT4 measured on the same specimen can on occasion cause a diagnostic dilemma. However, it should be recognized that the intrinsic log/linear TSH/FT4 relationship dictates that modest reductions in TSH (0.05 – 0.3 mIU/L), or modest elevations (3 – 10 mIU/L), would not be expected to be associated with changes in FT4 values outside the normal reference range (7,13,189,190). When a severe TSH/FT4 discordance is observed, the first step should be to re-measure both TSH and FT4 using a different manufacturers platform to rule out an assay interference problem with either the FT4 and/or TSH measurements. Such FT4 problems commonly result from alterations in a thyroid hormone binding protein (often albumin) or a drug interference [Sections 3E and 7] (27,66,67). On the other hand, the most common cause of a falsely high TSH results is assay interference from a human anti-mouse (HAMA) or an endogenous TSH antibody [Section 7c(i)] (191-193).

(ii) Hospitalized Patients

Current FT4 testing methods have not yet received adequate validation in the hospital setting in which a wide variety of non-thyroidal illnesses and drug therapies are frequently encountered that are known to impair the diagnostic accuracy of both thyroid hormone and TSH testing (13,27,85,103,148,172). In particular there are three major categories of hospitalized patients that deserve special attention. They are: a) patients without known thyroid gland dysfunction who exhibit either an elevated or diminished TT4 concurrently with NTI; b) patients with primary hypothyroidism and concurrent severe NTI and, c) patients with hyperthyroidism and concurrent NTI (157,172,194). Because the diagnostic reliability of FT4 testing is questionable in sick hospitalized patients, it has been recommended that a combination of both a T4 (preferably TT4) and a TSH test be employed for assessing the thyroid status in this setting (13,148,195). In most clinical situations involving discordant FT4 and TSH results, the TSH test is usually yields the most diagnostically reliable results, provided that the patient is not receiving medications such as glucocorticoids and dopamine that directly inhibit TSH secretion, or conditions that involve pituitary failure (70,187). Repetitive TSH testing may be particularly helpful in sorting out the cause of abnormal FT4 results as repeat TSH values will trend back towards normal over time if the TSH abnormality results from the acute effects of NTI alone, while the abnormality will persist if it is due to underlying thyroid dysfunction (151).

D. FT4 and FT3 reference ranges

The FT4 reference range for male and female infants 1 – 12 month-old is 0.9 – 1.9 ng/dL (11.6 – 24.5 pmol/L). The reference range for children 1 to 18 years old it is 0.9 – 1.6 ng/dL (11.6 – 20.6 pmol/L) which is similar to that observed for adults (6,62,93,108,112,114,182,196). Likewise, the FT3 reference range for males and females 1 month to 18 years was 1.0 – 4.0 pg/mL (2.1 – 6.1 pmol/L) (6,62,59,93,108,112,114). These determinations were made at 25°C, when testing is performed at 37°C both FT4 and FT3 values were reported as 1.5 fold higher (114). It is now recommended that FT4 and FT3 testing be performed at 37°C (89,104,114,197). Recent studies have reported an inverse relationship between pediatric age and FT4 immunoassay values for both males and females (198).

E. Interferences with FT4 and FT3 Tests

Common conditions that decrease the diagnostic accuracy of current free hormone estimate tests in ambulatory patients include:

(a) TBG abnormalities

States of Thyroxine Binding Globulin (TBG) excess or deficiency are commonly encountered in clinical practice in association with a variety of pathophysiologic conditions or drug therapies as listed in Table 1.

(i) Congenital TBG excess or deficiency

Neither free hormone immunoassays, or some free T4 index tests, are able to reliably estimate free hormone concentrations when TBG concentrations are grossly altered such as is the case with congenital TBG excess or deficiency states (27,81).

(ii) Pregnancy

As with non-pregnant patients TSH is the most reliable marker for thyroid status during pregnancy. However, pregnant patients with undetectable TSH during early treatment for hyperthyroidism need a reliable estimate of FT4 status and unfortunately non-pregnant FT4 reference ranges do not apply to pregnancy (39,40,42,149). Specifically, low FT4 immunoassay values are observed in a significant proportion of women by the third trimester of pregnancy to a method-dependent degree.Albumin levels tend to fall during pregnancy (199) and the frequency of low FT4 values during pregnancy appears to be method-related, likely reflecting the albumin-dependence of the method (11) (Figure 2).

The albumin-dependence of current FT4 immunoassay methods is only one of a myriad of factors that compromise establishing reference ranges for each trimester of pregnancy that could be applied to all populations, even were the same method used. Factors that influence FT4 reference ranges established using different studies include:

The binding protein (TBG and albumin) dependence of the method (11,94,183,200-202).

The week(s) of gestation studied which is critically important for the first trimester when the high levels of hCG stimulate the thyroid by virtue of hCG homology with TSH. Also, whether multiple pregnancies were excluded (203-210).

The size of the study and whether TPOAb-positive women were excluded (211-215).

The artifactual low FT4 immunoassay values, coupled with an inability to establish trimester-specific reference ranges that can be applied to all patient populations have led to the use of free T4 index (FT4I) methods for evaluating FT4 status of pregnant patients because non-pregnant reference ranges have been shown to apply throughout pregnancy (42,118,203,220,221). Even TT4 measurements can be used if the reference limits are adjusted by a factor of 1.5 to compensate for the increased TBG (39,40,42,76,222).

Autosomal dominant mutations in the Albumin or Transthyretin (prealbumin) gene can result in altered protein structures with enhanced affinity for thyroxine and/or triiodothyronine. These abnormal proteins can interfere with FT4 and/or FT3 measurements and result in inappropriately high values being reported (67,223). Familial Dysalbuminemic Hyperthyroxinemia (FDH) is a rare condition with a prevalence of ~1.8 % in the Hispanic population. There are 3 genetic variants with the R218H being the most common. This typically results in a two-fold elevation in TT4 but only minimal changes in TT3. The R218P mutation is associated with extreme TT4 elevations as well as TT3, whereas the L66P mutation affects mainly T3 (224). Affected individuals are euthyroid and have normal TSH and FT4 when measured by direct techniques such as equilibrium dialysis (67,175). Unfortunately, most FT4 estimate tests (immunoassays and indexes) report falsely high values that may prompt inappropriate treatment for presumed hyperthyroid if the condition is not recognized.

(c) T4 and T3 autoantibodies

T4 and T3 autoantibodies can falsely elevate total hormone, free hormone or THBR measurements depending on the method employed (68,69,132,133,225). The prevalence of thyroid hormone autoantibodies ranges from 2 percent in the general population to as much as 30 percent in patients with autoimmune thyroid disease (226). However, despite this high prevalence of detectable thyroid autoantibodies, significant interference caused by these antibodies is considerably less common and depends on the qualitative characteristics of the autoantibody present (i.e. its affinity for the test reagents). Further, different methods exhibit interference to a greater or lesser degree (68,69). Because autoantibody interference is difficult for the laboratory to detect proactively, it is most commonly first suspected by the physician who sees a gross discordance between the clinical presentation of the patient and the laboratory test results (132,225). Importantly, these autoantibodies will cross the placenta and may cause a false-negative screening test for congenital hypothyroidism in the newborn (227).

4. Serum TSH (Thyroid Stimulating Hormone/Thyrotropin) Assays

Over the last four decades serum TSH assay methodology has undergone dramatic improvements that have revolutionized strategies for thyroid testing and firmly established TSH as the first-line thyroid function test to assess thyroid hormone status for most clinical conditions (13,37,228). In fact, serum TSH has become the single most reliable test for diagnosing abnormalities in thyroid status, provided that patients are ambulatory and not receiving drug therapies that alter TSH secretion (187,228). The diagnostic superiority of TSH measurement arises principally from the physiologic inverse log/linear relationship between circulating TSH and free T4 concentrations (7,13,189,190).

(a) TSH Assay Methodology

TSH assay “quality” has historically been defined by its clinical sensitivity – the ability to discriminate between hyperthyroid and euthyroid values (13,229-231). The first generation of TSH assays used between 1965 and 1985 were based on RIA methodology that had limited functional sensitivity (~ 1.0 mIU/L) (232-234). Because these RIA-era TSH methods were too insensitive to detect TSH in all euthyroid subjects, their clinical utility was limited to the diagnosis of primary hypothyroidism (235-237). Between 1970 and 1980, considerable efforts were made to develop TSH methods that could discriminate euthyroid from hyperthyroid TSH values (238-242). The use of modified RIA procedures employing a pre-treatment lectin affinity extraction or a long pre-incubation period before tracer addition, could achieve this discrimination, but these were principally limited to research applications (241-243). In 1970, the hypothalamic tripeptide, Thyrotropin Releasing Hormone (TRH) (Thyroliberin) was synthesized and shown to be capable of stimulating serum TSH into the measurable range in all euthyroid subjects, but not patients with hyperthyroidism or hypopituitarism (244,245). These observations led to the practice of measuring serum TSH 15-30 minutes following a 200-500µg IV TRH dose as a way to overcome the insensitivity of the TSH RIA methodology (246-249). However, the practice of TRH testing fell into decline after the more sensitive immunometric assay (IMA) methodology (also called “sandwich” or “noncompetitive” methodology) became available in the mid-1980s (7,248-250). These IMA techniques are based on the excess antibody approach of Miles and Hales, originally reported in the 1960s but did not become widely adopted until advances in monoclonal antibody technology allowed the large-scale production of specific antibodies in the 1980s (251-253). Mechanistically, these IMA methods employed an excess of TSH monoclonal antibody, bound to a solid support (bead, tube, magnetic microparticle or adsorption gel) that captured TSH from the serum specimen during a 20 to 120 minute incubation period (254-256). A different poly- or monoclonal TSH antibodies, targeted to a different TSH epitope(s) and labeled with an isotopic (I-125) or non-isotopic signal was then added followed by a further incubation and removal of unbound constituents by washing. The signal bound to the solid support was quantified as being directly proportional to the serum TSH concentration in the test sample. Later modifications to this basic concept included the use of chimeric monoclonal antibodies to reduce interference by heterophilic antibodies [Section 7c(i)] and the use of Avidin-Biotin and magnetic particle separation techniques (257-259). In the last two decades, TSH assay sensitivity has been further enhanced by the adoption of non-isotopic (chemiluminescent and fluorescent) signals that are inherently more sensitive than I-125 and offer the additional advantage of being easier to automate (15,260,261). By 1990, IMA non-isotopic methods had replaced most TSH RIA methods and as a result of inherently greater assay sensitivity and specificity resulted in narrowing the TSH reference range by reducing glycoprotein hormone cross-reactivity and improving precision (5,262). Currently, most TSH testing is performed on automated immunoassay platforms employing advanced IMA technology [Section 8] (15,33,263-266).

(b) TSH Nomenclature

The first generation RIA methods had a detection limit approximating 1.0 mIU/L. A ten-fold improvement in sensitivity (~ 0.1 mIU/L) was seen with the early IMA methods that used an isotopic (I-125) signal (250). With this level of sensitivity, the lower euthyroid reference limit was determined as being 0.3-0.4 mIU/L and overt hyperthyroidism could be diagnosed with a high degree of confidence without the need for TRH stimulation (247,250,267-272). However, a sensitivity of 0.10 mIU/L was insufficient for detecting different degrees of hyperthyroidism (i.e subclinical versus overt) and efforts to further sensitize IMA methodology continued (15,247,263,274,273). It became clear that IMA sensitivity was largely determined by the “signal” used. The first IMA methods that used a radioisotopic signal (I-125) were designated “immunoradiometric assays”, or IRMAs (250). Subsequent IMA methods adopted a variety of non-isotopic signals that gave rise to a lexicon of terminology to distinguish between assays using different signals. For example, immunoenzymometric assays (IEMA) used enzyme signals; immunofluorometric assays (IFMA) used fluorophors as signals, immunochemiluminometric assays (ICMA) used chemiluminescent molecules as signals and immunobioluminometric assays (IBMA) used bioluminescent signal molecules (275) (15). This explosion of methodology led to a range of IMAs with competing claims for sensitivity. Initially, the IMA methods were designated as “sensitive”, “highly sensitive”, “ultrasensitive” or “supersensitive” assays – terms used to distinguish the new IMA methodology from the older insensitive RIA methods then still in use (5,276-280). This descriptive nomenclature was confusing and led to a debate concerning the meaning of “sensitivity” (15,281). After it became evident that it was the between-run precision of the method that was the best determinant of assay sensitivity, a new parameter “functional sensitivity” became adopted (13,15). Functional sensitivity has been defined as the TSH value associated with a 20 percent coefficient of variation (CV) established from assays run over a 6 to 8 week period (a typical clinical interval used to assess TSH changes in an out-patient setting) (13). Both manufacturers and clinical laboratories have now adopted this functional sensitivity definition as the lowest reporting limit for TSH assays. Importantly, this concept of functional sensitivity has also been extended to the measurement of other analytes (5,13,13,15,282). The new nomenclature also defines each generation as having a ten-fold difference in functional sensitivity. For example, RIA methods with functional sensitivities between 1 and 2 mIU/L are designated as “first generation”. IMA methods with functional sensitivities between 0.1 and 0.2 mIU/L are designated as “second generation”. Third generation TSH methods with functional sensitivities between 0.01 and 0.02 mIU/L are typically automated and non-isotopic and have become recognized as necessary to meet the current standard of care (15,31,282,283,284).

(c) TSH Population Reference Ranges

Serum TSH is now considered to be the most important thyroid test for assessing the early development of either hypo- or hyperthyroidism, because the log/linear TSH/FT4 relationship dictates that an altered TSH will be the first abnormality to appear – as soon as the pituitary registers that FT4 has changed from its genetically-determined setpoint for that particular individual (7,189,190,228,285). It follows that the setting of the TSH reference range is critical for detecting mild (subclinical) hypo- or hyperthyroidism.

Current guidelines recommend that “TSH reference intervals should be established from the 95 percent confidence limits of the log-transformed values of at least 120 rigorously screened normal euthyroid volunteers who have: (a) No detectable thyroid autoantibodies, TPOAb or TgAb (measured by sensitive immunoassay); (b) No personal or family history of thyroid dysfunction; (c) No visible or palpable goiter and, (c) Who are taking no medications except estrogen” (13).

The sensitivity limitations of the first generation RIA methods precluded detecting the lower euthyroid reference limit (2.5 percentile) whereas upper reference limits (97.5 percentile) were reportedly to be as high as 10 mIU/L. This elevated upper limit primarily resulted from gonadotropin cross-reactivity and the failure to exclude individuals with subclinical autoimmune thyroid disease who have higher TSH values (237,286-288). Using current third generation IMA methodology, the lower TSH reference limit has now been shown to approximate 0.3 to 0.4 mIU/L (289-291). This estimate appears consistent irrespective of the population studied or the method used (289,290,292-297). In contrast, the setting of the TSH upper reference limit (97.5 percentile) has become controversial (291,298-301) with estimates ranging from 2.1 mIU/L (295,297) to 7.5 mIU/L (289,299,302). Multiple factors influence the calculation of the TSH upper reference limit for a population. These include population demographics like sex (289), ethnicity (289,303-305), iodine intake (306), BMI (307-312), smoking status (303,313,314) and age (302,304,315,316) as well as the failure to exclude the presence of subclinical autoimmune thyroid disease using the presence of TPO antibodies (287,289,317,318).

Recent studies have suggested that TSH increases with age and that a mild TSH elevation in elderly individuals may even convey a survival benefit, although other reports dispute this (299,302,319-324). These reports have led to the suggestion that age-specific TSH reference limits should be considered (315,325). However, it appears that these mild TSH elevations may be transient (326), or in part relate to polymorphisms of the TSH receptor and cannot be interpreted to imply that subclinical hypothyroidism per se is necessarily advantageous for elderly individuals (321,327,328). Whereas there appears to be a positive correlation between age and TSH concentrations in iodine sufficient populations (289,299,329) the opposite is the case for iodine deficient populations in which there appears to be no TSH increase with age, or even a decline (286,295,296,330,331). In areas of iodine sufficiency the correlation between increasing TSH and age could represent a failure to exclude subjects with autoimmunity who may, or may not, be detected by a positive TPOAb test (317,332,333). Complicating these questions is the fact that current TSH IMAs differ in specificity for recognizing circulating TSH isoforms and that this can give rise to a full 1.0 mIU/L difference in TSH values reported by different assays – a difference that in some cases is greater than the influence of many of the other variables listed above (31,296,334,335). Because hypothalamic TRH modulates TSH molecular glycosylation and biologic activity, a rise in TSH with age could represent an increase in the secretion of biologically inactive TSH, yet immunologically detected isoforms (336,337). The blunting of the TSH response to TRH and decreased amplitude of the TSH nocturnal peak would be consistent with this premise (338,339). In contrast, in areas of iodine deficiency the inverse relationship between TSH and age could represent a failure to exclude individuals with autonomously functioning nodules (332).

The TSH upper reference limits for non-pregnant subjects remains a contentious issue, such that it is difficult for manufacturers to cite a TSH reference range appropriate for universal adoption across different populations in different geographic areas (Figure 3). This has led to guidelines proposing the adoption of an empiric TSH upper limit of 2.5 -3.0 mIU/L, which is in accord with the TSH interval associated with the lowest prevalence of thyroid antibodies (13,37,228,317). Furthermore, a TSH upper limit between 2.0 and 3.0 mIU/L would also be appropriate for reproductive age women and pregnancy, in whom current guidelines now recommend using 2.5 mIU/L for preconception planning and the first trimester, and 3.0 mIU/L as the upper limit for the second and third trimesters (13,39,40,203,228,288,340).

The adult TSH population reference range does not apply to neonates or children. Serum TSH values are generally higher in neonates and then gradually decline until the adult range is reached after puberty (198,296,341-346). This necessitates using age-specific TSH reference ranges for diagnosing thyroid dysfunction in these pediatric age categories.

(d) TSH Biologic Variability

The within-person variability of basal TSH concentrations is relatively narrow compared with between-person variability both for non-pregnant and pregnant subjects (190,347,348). In fact, the serum TSH concentrations of euthyroid volunteers was found to vary only 0.5 mIU/L when tested every month over a span of one year (347). Twin studies further suggest that there are genetic factors that determine hypothalamic-pituitary-thyroid setpoints (285,349,350). These studies report that the inheritable contribution to the serum TSH level approximates 65 percent (349,351). This genetic influence appears, in part, to involve single nucleotide polymorphisms in thyroid hormone pathway genes such as the phosphodiesterase gene (PDE8B) (352-354) and the TSH receptor, where polymorphisms may be associated with gain or loss of function (352,355,356), and the type II deiodinase enzyme (357). Undoubtedly, such polymorphisms likely account for some of the euthyroid outliers that skew TSH reference range calculations (329,358).

Serum TSH, as with other thyroid tests, has narrow within-person variability and thus a low (< 0.6) index of individuality (IoI) (190,347,348,359-361). This limits the usefulness of using the population-based reference range to detect thyroid dysfunction in an individual patient (190,348,361,362). It further suggests that when evaluating patients with marginally (yet confirmed) either low (0.1–0.4 mIU/L) or high (3–10 mIU/L) TSH abnormalities, it may be more important to consider the degree of TSH abnormality relative to patient-specific risk factors for cardiovascular disease rather than the degree of the abnormality relative to the TSH reference range (Figure 4) (363,364).

(e) Clinical Utility of TSH Measurement

(i) Ambulatory Patients

Serum TSH normally exhibits a diurnal variation with a peak between midnight and 0400 and (365-368). However, because TSH testing is most commonly performed in the outpatient setting during normal daytime working hours, it is not usually influenced by the time of day of the blood draw (367). Furthermore, there is no need to withhold the levothyroxine (L-T4) dose on the day of the blood draw, because TSH secretion is slow to respond to changes in thyroxine status (13). The TSH concentration is used as the target for adjusting L-T4 medication within a very narrow therapeutic index (228,369,370). It is well known that L-T4 absorption is highly variable and influenced by the simultaneous ingestion of food. To address this issue a recent study reported that TSH remained within the narrowest target range when the daily L-T4 dose was ingested in a fasting state, preferably before breakfast after an overnight fast (371).

Current guidelines recommend that serum TSH be used as the first-line test for detecting both overt and subclinical hypo- or hyperthyroidism in ambulatory patients with stable thyroid status and intact hypothalamic/pituitary function (13,37,195,228,372). The current standards of care necessitate that laboratories use third generation TSH assays for this purpose (functional sensitivity 0.01-0.02 mIU/L) (31,283,373). This level of sensitivity is necessary for detecting differing degrees of TSH suppression. For example, TSH measurement in the 0.01 – 0.10 mIU/L range represents a significant risk factor for the production of atrial fibrillation in older patient populations and often is an iatrogenic consequence of L-T4 suppression or an unintended result of replacement therapy (374-376). In addition, targeting the degree of TSH suppression plays a critical role in the management of thyroid cancer (34,377,378).

One disadvantage of employing TSH as a diagnostic screening test for thyroid function is that it can often fail to detect the presence of pituitary and/or hypothalamic disease [central hypothyroidism or TSH secreting pituitary tumors (TSHomas)] (185,186,188,379). In these conditions serum TSH can be paradoxically within the normal reference limits because current assays cannot distinguish between normal and biologically altered TSH isoforms that may be present in these states. For example, TSH isoforms with impaired biologic activity are typically secreted in central hypothyroidism whereas TSH isoforms with enhanced biologic activity are often secreted by TSHomas (185,379). These abnormal TSH isoforms can result in paradoxically normal or high TSH being reported in the face of clinical hypo- or hyperthyroidism, respectively (185,186,379,380).

(ii) Hospitalized Patients

Non-thyroidal illnesses can frequently alter thyroid hormone peripheral metabolism and hypothalamic/pituitary function and result in a variety of thyroid test abnormalities including both decreased and increased serum TSH levels (13,151,381-383). However, it is important to distinguish the generally mild, transient TSH alterations typical of non-thyroidal illnesses from the more profound and persistent TSH changes associated with hyper- or hypothyroidism (13,151,172).

5. Thyroid Specific Autoantibodies (TPOAb, TgAb and TRAb)

Tests for antibodies against thyroid-specific antigens, thyroid peroxidase (TPO), thyroglobulin (Tg) and TSH receptors are used in the diagnosis of autoimmune thyroid disorders (19,384). Over the last four decades, thyroid antibody test methodologies have evolved from semi-quantitative agglutination and complement fixation techniques and whole animal bioassays, to specific ligand assays using recombinant antigens or cell culture systems transfected with the human TSH receptor (19,385-387). Unfortunately, the diagnostic and prognostic value of these tests is hampered by differences in sensitivity and specificity (388). Although thyroid autoantibody measurements have clinical utility for a number of clinical conditions, these tests should be selectively employed primarily as an adjunct to other diagnostic testing procedures.

(a) TSH Receptor Autoantibodies (TRAb)

The TSH receptor (TSHR) serves as a major autoantigen (389,390). Thyroid gland stimulation occurs when TSH binds to TSHR on thyrocyte plasma membranes and activates the cAMP and phospholipase C signaling pathways (390). The TSH receptor belongs to the G protein-coupled class of transmembrane receptors. It undergoes complex posttranslational processing in which the ectodomain of the receptor is cleaved to release a subunit into the circulation (389). A TSH-like thyroid stimulator found uniquely in the serum of Graves’ disease patients was first described using a guinea pig bioassay system in 1956 (391). Later, using a mouse thyroid bioassay system this serum factor was noted to have a prolonged stimulatory effect as compared to TSH and hence was termed to be a “long-acting thyroid stimulator” or LATS (392,393). Much later, the LATS factor was recognized not to be a TSH-like protein but an antibody that was capable of stimulating the TSH receptor causing Graves’ hyperthyroidism (394). In addition, TSH receptor antibodies have become implicated in the pathogenesis of Graves’ opthalmopathy (394-396). TRAbs are heterogeneous (polyclonal) and fall into two general classes both of which can be associated with autoimmune thyroid disorders – (a) thyroid stimulating autoantibodies (TSAb) that mimic that the actions of TSH and cause Graves’ hyperthyroidism and (b), blocking antibodies (TBAb) that block TSH binding to its receptor and can cause hypothyroidism (19,390,394,397). Although TSH, TSAb and TBAb appear to bind to different sites on the TSH receptor ectoderm, TSAb and TBAb have similar affinities and often overlapping epitope specificities (398). In some cases of Graves’ hyperthyroidism, TBAb has been detected in association with TSAb (399,400, and the dominance of one over the other can change over time in response to treatment 401). Because both TSAb and TBAb can be present in the same patient, the relative concentrations and receptor binding characteristics of these two classes of TRAb may influence the severity of Graves’ hyperthyroidism and the response to antithyroid drug therapy or pregnancy (389,399,401-406). For completeness, it should also be mentioned that a third class of “neutral” TRAb has also been described, of which the functional significance has yet to be determined (404,407).

Two different methodologic approaches have been used to quantify TSH receptor antibodies (20,21,397): (i) TSH receptor tests (TRAb assays) also called TBII or TSH Binding Inhibition Immunoglobulin assays, and (ii) Bioassays that use whole cells transfected with human or chimeric TSH receptors that produce a biologic response (cAMP or bioreporter gene) when TSHR stimulating or blocking antibodies are present in a serum specimen.

(i) Bioassay methods (TSAb/TBAb)
Early assays were either homogeneous and used surgical human thyroid specimens, or heterogeneous, using mouse or guinea pig thyroid cells and later rat FRTL-5 cell lines to detect TSH receptor stimulating antibodies. These methods typically required pre-extraction of immunoglobulins from the serum specimen (391,408-412). Later, TRAb bioassays used cells with endogenously expressed or stably transfected human TSH receptors had the ability to use unextracted serum specimens (413-415). Current TSH receptor antibody bioassays are functional assays that use intact (typically CHO) cells transfected with human or chimeric TSH receptors, which when exposed to serum containing TSH receptor antibodies use cAMP or a reporter gene (luciferase) as a biological marker for stimulating or blocking activity in a patient’s serum (21,22,24,387,411-413,416). Bioassays are more technically demanding than the more commonly used TBII/TRAb assays because they use viable cells but can be modified to detect blocking antibodies (TBAb) which may coexist with stimulating antibodies (TSAb) in the same sera and make interpretation difficult (21,417). The most recent development is for 2nd generation assays to use a chimeric human/rat LH TSHR to effectively eliminate the influence of blocking antibodies. This new approach has shown excellent sensitivity and specificity for diagnosing Graves’ hyperthyroidism and clinical utility for monitoring the effects of anti-thyroid drug therapy (24,418).

(ii) Receptor methods (TRAb/TBII)
TSH receptor antibody tests (TRAb) detect serum TSH receptor immunoglobulins that interact with the TSH receptor without the functional discrimination of stimulating from blocking antibodies. They do however offer the advantage of being rapid and being able to be automated. The methods are based on competitive binding principles, whereby immunoglobulins in serum compete with radiolabeled bovine or porcine TSH or a TSH receptor binding monoclonal antibody (coded M22) for binding to an immobilized TSHR preparation (recombinant human or porcine TSHR) – these methods thus detect TSH binding inhibition immunoglobulins (TBII) (21,22,387,416,419,420). TRAb tests have progressively improved over recent years. The 1st generation of TBII methods used I125-labeled TSH and animal tissues (guinea pig fat cells or porcine thyroid membranes) as sources of the TSH receptor preparation (21). Subsequently, a 2nd generation of non-isotopic TBII assays was developed based on isotopic or chemiluminescent-labeled TSH binding to human TSH receptors expressed in CHO cells or directly to a recombinant TSH receptor protein preparation (21,421,422). These 2nd generation assays were reported to have superior sensitivity for diagnosing Graves’ disease (423). Current non-isotopic 3rd generation assays use the TSHR binding monoclonal antibody (M22), have improved sensitivity and are commercially available using automated systems (19,21,22,24,387,419-421,424,425). 3rd generation assays have also shown a good correlation and comparable overall diagnostic sensitivity with bioassay methods (387,399,416,426-428). However, between-method variability remains high and interassay precision often suboptimal (CVs > 10 %) despite the use of the same international reference preparation for calibration (388,429). This fact makes it difficult to compare values using different methods and indicates that further efforts focused on additional assay improvements are needed (19,388,430).

(iii) Clinical Use of TRAb Tests
Currently, most TRAb testing is performed using automated TBII methodology [Section 5a(ii)]. Although TBII tests do not distinguish between stimulating and blocking antibodies, this methodologic distinction is often unnecessary, because it is evident from the clinical presentation of hyper- or hypothyroid features. However, both TSHR stimulating and blocking antibodies can be detected simultaneously in the same patient and cause diagnostic confusion (418,431). TRAb testing can be used in the differential diagnosis of the etiology of hyperthyroidism, as a useful and independent risk factor for Graves’ opthalmopathy and to monitor responses to therapy for this condition (432,433). TRAb measured prior to radioiodine therapy for Graves’ hyperthyroidism can also help predict the risk for exacerbating opthalmopathy (395,434-438). Although early studies reported that TRAb measurement is not useful for predicting the response to antithyroid drug treatment (400,422,426,439-442), current bioassays report clinical utility in distinguishing between remission and relapse (24). However, it is the evaluation of pregnant patients with a history of autoimmune thyroid disease, or active or previously treated Graves’ hyperthyroidism in whom there is a risk of transplacental passage of TRAb (TSAb or TBAb) to the infant, that is one of the most important applications of TRAb testing (20,21,40,150,387,443-445). For this application, because they detect both stimulating and blocking antibodies, TBII tests may be preferable to bioassay methods, because and the expression of thyroid dysfunction may be different in the mother and infant (446). It is currently recommended that TRAb be measured in the third trimester in all pregnant patients with active Graves’ hyperthyroidism or who have received prior ablative (radioiodine or surgery) therapy for Graves’ disease (13,19,40,150). This is because TRAb can remain high even after patients have been rendered hypothyroid and are maintained on L-T4 replacement therapy (399). A very rare exception to this analysis are patients presenting with very high circulating concentrations of hCG due to choriocarcinoma or hydatiform mole who may have misleading positive results using some TSAb assays (441).

(b) Thyroid Peroxidase Autoantibodies (TPOAb)

TPO is a large, dimeric, membrane-associated, globular glycoprotein that is expressed on the apical surface of thyrocytes. TPO autoantibodies (TPOAb) found in sera typically have high affinities for an immunodominant region of the intact TPO molecule. When present, these autoantibodies vary in titre and IgG subclass and display complement-fixing properties (447). Studies have shown that epitope fingerprints are genetically conserved suggesting a possible functional importance (448). However, it is still unclear whether the epitope profile correlates with the presence of, or potential for, the development of thyroid dysfunction with which TPOAb presence is most commonly associated clinically (447,449-452).

TPOAb antibodies were initially detected as antibodies against thyroid microsomes (antimicrosomal antibody, AMA) using semi-quantitative complement fixation and tanned erythrocyte hemaagglutination techniques (453-455). Recent studies have identified the principal antigen in the AMA tests as the thyroid peroxidase (TPO) enzyme, a 100 kD glycosylated protein present in thyroid microsomes (456,457). Manual agglutination tests have now been replaced by automated, more specific TPOAb immunoassay or immunometric assay methods that use purified or recombinant TPO (13,19,386,458-465). Despite calibration against the same International Reference Preparation (MRC 66/387), there is considerable inter-method variability of current TPOAb assays (correlation coefficients 0.65 and 0.87) that precludes the numeric comparison of serum TPOAb values reported by different tests (19,385,386,461,464,465). It appears that both the methodologic principles of the test and the purity of the TPO reagent used may influence the sensitivity, specificity and reference range of the method (19,386). The variability in sensitivity limits (range <0.3 to >20 kIU/L) and the reference ranges of different methods has led to different interpretations regarding the normalcy of having a detectable TPOAb (19,465). Specifically, assays characterized by a low detection limit (<10 kIU/L) typically report un-measureable TPOAb values for “normal” euthyroid subjects, suggesting that the detection of this autoantibody is a pathologic finding (466). In contrast, assays reporting higher detection limits (>10 kIU/L) typically cite a TPOAb “normal range”, suggesting that low levels of this autoantibody are compatible with normal physiology (467). Whether detectable “normal” values reflect physiology or lack of assay specificity remains to be determined.

(i) Clinical Significance of Detecting TPOAb
Estimates of TPOAb prevalence depend on the sensitivity and specificity of the method employed (465,468). In addition, ethnic and/or geographic factors (such as iodine intake) influence the TPOAb prevalence in population studies (469). For example, TPOAb prevalence is significantly higher (~11 percent) in dietary iodine-sufficient countries like the United States and Japan as compared with iodine deficient areas in Europe (~ 6 percent) (289,290,470). The prevalence of TPOAb is higher in women of all age groups and ethnicities, presumably reflecting the higher propensity for autoimmunity as compared with men (289,470). Approximately 70-80 % of patients with Graves’ disease and virtually all patients with Hashimoto’s or post-partum thyroiditis have TPOAb detected (386,461,464,466,468). TPOAb has, in fact, been implicated as a cytotoxic agent in the destructive thyroiditic process (452,471-474). However, TPOAb prevalence is also significantly higher in various non-thyroidal autoimmune disorders in which no apparent thyroid dysfunction is evident (475-477). Aging is associated with an increasing prevalence of TPOAb that parallels the increasing prevalence of both subclinical (mild) and clinical hypothyroidism (289). In fact, the NHANES III survey reported that TPOAb prevalence increases with age and approaches 15-20 percent in elderly females in the iodine-sufficient United States (289). This same study found that the odds ratio for hypothyroidism was strongly associated with the presence of TPOAb but not TgAb, suggesting that only TPOAb has an autoimmune etiology (289). Although the presence of TgAb alone did not appear to be associated with hypothyroidism or TSH elevations, the combination of TPOAb and TgAb versus TPOAb alone may be more pathologically significant, although further studies would be needed to confirm this (287,289,317,452). It is now apparent that the presence of TPOAb in the serum of apparently euthyroid individuals (TSH within reference range) appears to be a risk factor for future development of overt hypothyroidism that subsequently becomes evident at the rate of approximately 2 percent per year in such populations (447,448,478,479).

In this context, it is reasonable to assume that TPOAb measurement may serve as a useful prognostic indicator for future thyroid dysfunction (479,480). However, it is noteworthy that the detection of TPOAb does not always precede the development of thyroid dysfunction. A recent study suggests that a hypoechoic ultrasound pattern can be seen before a biochemical TPOAb abnormality is detected (333,469). Further, some individuals with unequivocal TSH elevations, presumably resulting from autoimmune destructive disease of the thyroid, do not have TPOAb detected (317). Presumably, this paradoxical absence of TPOAb in some patients with elevated TSH likely reflects the suboptimal sensitivity and/or specificity of current TPOAb tests or a non-autoimmune cause of thyroid failure (i.e. atrophic thyroiditis) (289,317,465,481).

Although changes in autoantibody concentrations often occur with treatment or reflect a change in disease activity, serial TPOAb measurements are not recommended for monitoring treatment for autoimmune thyroid diseases (228,386,482). This is not surprising since treatment of these disorders addresses the consequence (thyroid dysfunction) and not the cause (autoimmunity) of the disease. However, where it may have an important clinical application is to employ the presence of serum TPOAb as a risk factor for developing thyroid dysfunction in patients receiving Amiodarone, Interferon-alpha, Interleukin-2 or Lithium therapies which all appear to act as triggers for initiating autoimmune thyroid dysfunction in susceptible (especially TPOAb-positive) individuals (13,70,483-489).

During pregnancy the presence of TPOAb has been linked to reproductive complications such as miscarriage, infertility, IVF failure, fetal death, pre-eclampisa, pre-term delivery and post-partum thyroiditis and depression (39,40,211,490-501). However, if this association represents cause or effect has yet to be been resolved.

(c) Thyroglobulin Autoantibodies (TgAb)

Thyroglobulin autoantibodies predominantly belong to the immunoglobulin G (IgG) class, are not complement fixing and are generally conformational (502). Serum TgAb were the first thyroid antibody to be detected in patients with autoimmune thyroid disorders using tanned red cell hemagglutination techniques (454). Subsequently, methodologies for detecting TgAb have evolved in parallel with those for TPOAb methodology from semi-quantitative techniques, to more sensitive ELISA and RIA methods and most recently non-isotopic competitive or non-competitive immunoassays (8,19,23,461,465-467,503-505). Unfortunately, the inter-method variability of these TgAb assays is even greater than that of comparable TPOAb tests discussed above (8,19,23,503-505). Additionally, high levels of endogenous thyroglobulin in the serum specimen have the potential to influence TgAb measurements (505-508). The between-method variability reflects variability in both purity and the epitope specificity of the Tg protein reagent, as well as the inherent heterogeneity of the antibodies present in different patients sera (509,510). As with TPOAb methods, TgAb tests report wide variability in both sensitivity limits (<0.3 to >100 kIU/L) and cut-off values used to define “TgAb positivity” despite the use of the same International Reference Preparation (MRC 65/93) (8,23,503-505,511). Although there are reports that low levels of TgAb may be present in normal euthyroid individuals, it is unclear whether this represents assay noise due to matrix effects or “natural” antibodies (467,512). Further complicating this question are studies suggesting that there may be qualitative differences in TgAb epitope specificities expressed by normal individuals versus patients with either differentiated thyroid cancers (DTC) or autoimmune thyroid disorders (467,510,513). These differences in test specificity impact the reliability of a TgAb method for screening sera for prior to serum Tg measurement [Section 6a(v)].

(i) Clinical Utility of TgAb Tests
Autoantibodies against Tg are encountered in autoimmune thyroid conditions, usually in association with TPOAb (289,479,504,514). However, the NHANES III survey found that only three percent of subjects with no risk factors for thyroid disease had serum TgAb present without detectable TPOAb (289,317). Further, in these subjects there was no association observed between the isolated presence of TgAb and TSH abnormalities (289,317). This suggests that it may be unnecessary to measure both TPOAb and TgAb for a routine evaluation of thyroid autoimmunity (19,317,479). In fact, when autoimmune thyroid disease is present, there is some evidence that assessing the combination of TPOAb and TgAb has greater diagnostic utility than the TPOAb measurement alone (287,317479,515).

TgAb measurement is primarily used as an adjunctive test to serum Tg measurement when monitoring patients with differentiated thyroid cancers (DTC) (34). The role of TgAb testing is tow-fold: 1) to authenticate that a Tg measurement is not compromised by TgAb interference [Section 6A(d)(ii)]. 2) as an independent surrogate tumor-marker in the ~20 percent of patients with circulating TgAb [Section 6A(d)(ii)]. Current guidelines recommend that all sera be prescreened for TgAb by a sensitive immunoassay method prior to serum Tg testing, because there appears to be no threshold TgAb concentration that precludes TgAb interference with Tg measurements (8,13,16,23,34,466,504,516). Immunoassay methods detect TgAb in approximately twenty percent of patients presenting with DTC (23,466,517-519). The prevalence of TgAb is typically higher in patients with papillary versus follicular tumors and is frequently associated with the presence of lymph node metastases (504,517,519,520). Perhaps of even greater importance is the observation that serially determined TgAb concentrations may also serve as an independent parameter for detecting changes in tumor mass in patients with an established diagnosis of DTC (Figure 5) (519-524). For example, after TgAb-positive patients are rendered athyreotic by surgery, TgAb concentrations typically progressively decline during the first few post-operative years and typically become undetectable after a median of three years of follow-up (519,520,524). In contrast, a rise in, or de novo appearance of, TgAb is often the first indication of tumor recurrence (466,520,525). However, when using serial TgAb measurements as a surrogate marker for changes in tumor burden it is essential to use the same TgAb method, because of the large between-method differences observed with this assay [Section 6A(a] (8,16,23,466,503505,511).

Typically, serial serum changes in Tg RIA versus TgAb are concordant (466,521). However, TgAb is the more responsive parameter and a disparity between these two parameters (rising TgAb/declining Tg RIA) may still indicate recurrence, because as tumor mass increases and more Tg is secreted there is an increase in Tg-TgAb immune complexes which may be more rapidly cleared than free Tg (526). In some cases discordant serial serum Tg RIA versus TgAb responses are seen. For example, declining serum TgAb associated with rising Tg RIA can result following thyroidectomy in patients with autoimmune thyroid disease, whereby circulating TgAb declines in response to the decrease in normally iodinated Tg secreted from thyroid remnant tissue at the time that poorly iodinated (less immunogenic) Tg is being secreted by recurrent tumor. Transient rises in TgAb may be seen in response to the acute release of Tg following thyroid surgery (527), fine needle aspiration biopsy (528,529) or more chronically during the months following radioiodine treatment that releases Tg by radioautolytic damage (517,525,530-533).

6. Thyroglobulin (Tg)

Thyroglobulin appears to play a central role in a wide variety of pathophysiologic conditions affecting the thyroid gland. For example, Tg has been implicated as a possible autoantigen involved in the production of thyroid autoimmune diseases (384,502,534). Genetic defects in Tg biosynthesis have been clearly shown to result in congenital hypothyroidism due to dyshormongenesis (13,535-538). The tissue-specific origin of Tg has also led to its use for as a test for determining the etiology of congenital hypothyroidism (539,540) and for diagnosing factitious hyperthyroidism where failure to observe elevated serum Tg values characteristic of true hyperthyroidism is absent (541-543). Serum Tg measurement is primarily used as a tumor marker for differentiated thyroid cancers (DTC) (34,38,544-551). The serum Tg concentration has also been employed to assess iodine nutrition status for population studies (552,553).

Over the last ten years laboratories have adopted the more rapid, automated Tg immunometric assays (IMA) to replace the older isotopic radioimmunoassays (RIA) (8,554,555). However, some laboratories still retain a Tg RIA because this methodology appears less prone to interferences from Tg autoantibodies (TgAb) and heterophilic antibodies (HAMA) [Section 7c] (8,12,23,519,547,548,556). It is particularly important to note that assay sensitivity and specificity critically influences the clinical utility of Tg testing. In fact, current Tg IMAs are highly variable depending on the method employed with values varying as much as three-fold and assay functional sensitivities that vary as much as ten-fold (8,12,546,554,557).

Unfortunately, not all current Tg methods are directly standardized directly against the Certified Reference Preparation CRM-457 (555,562,563). Even methods that use CRM-457 standardization can display two- to three-fold differences in serum Tg values (8,13,548,554,563). This between-method variability is higher than the biologic variability (~14 percent) seen in euthyroid subjects and therefore reflects methodological differences (360,564). Variable assay reference ranges reflect inherent specificity differences of methods compounded by differences in the rigor used to select normal euthyroid subjects without thyroid pathology and the influence of occult TgAb interference. IMA methods typically display greater variability than RIA methods, even in the absence of TgAb (Figure 6a) (8). This variability reflects technical problems relating to standardization and/or assay matrix differences (192,555) as well as interference by TgAb that may not be detected by current assays (12,23). In addition, IMA methodology uses monoclonal antibody (MAb) reagents that typically have narrower epitope specificities for the heterogeneous Tg isoforms present in the circulation, as compared with broad epitope specificity inherent the use of polyclonal antibodies (PAb) for RIA methods (8,16,565-567). For these and other reasons, current guidelines stress the critical importance of maintaining the same Tg method for serial monitoring of patients with DTC, because a change in Tg method is very likely to compromise the clinical value of serial Tg monitoring (8,13,34,516,554,568).

(b) Insensitivity/ Functional Sensitivity

As stressed in current guidelines it is critical that assay sensitivity be assessed in terms of functional sensitivity and not analytical sensitivity (13,282,568,569). Functional sensitivity represents the between-run precision of measuring low concentrations. Generally, the functional sensitivity of an immunoassay is defined as the lowest analyte concentration that can be measured with less than 20 percent between-run coefficient of variation (CV). Protocols used to determine assay functional sensitivity are analyte and matrix dependent:

For immunometric assays (IMA) that are used to measure Tg in TgAb-negative sera, the following stipulations apply to the protocol used for determining functional sensitivity: (13,568):
• Precision assessment should be determined in human serum free from TgAb – to eliminate interferences and matrix effects (569).
• The precision assessment should be performed over a 6 to 12 month period – to represent the typical interval used for monitoring patients with DTC.
• At least two different lots of reagents and two instrument calibrators should be used when evaluating between-run precision – because over time there is erosion of precision, especially at the extremes of the reportable range, relative to a myriad of factors relating to variability in reagents, instruments and the technical operation of the test (570).

For radioimmunoassays (RIA) and tandem mass spectrometry assays (LC-MS/MS) that are used to measure Tg in TgAb-positive sera, the above protocol used for determining functional sensitivity should be modified as follows to ensure precision is representative of measuring Tg in the specimens tested in clinical practice: (13):
• Precision assessment should be determined in human TgAb-positive sera – to ensure performance is relevant to the matrix of the clinical specimens.
• If more than one instrument (i.e. LC-MS/MS) is used, the precision assessment should include an equal amount of data from both instruments measured throughout the evaluation period.

Because current Tg assays display as much as a ten-fold difference in functional sensitivity, it is useful to consider a generational approach to Tg assay nomenclature, analogous to that established for TSH methods (5,282). In this context, most current Tg assays still only have 1st generation sensitivity (functional sensitivity = 0.5 to 1.0 µg/L when standardized directly against CRM-457) – comparable to that of early Tg radioimmunoassay (RIA) methods (4,8,554,571). Second generation IMAs with on order of magnitude greater sensitivity (functional sensitivity = 0.05–0.10 µg/L) are increasingly becoming commercially available (546,554,571-574). Because the 1st generation assay functional sensitivity limit of ~ 1.0 µg/L is very close to the lower limit of the euthyroid reference range, 1st generation assays have suboptimal sensitivity for detecting early tumor recurrence in thyroidectomized DTC patients, especially when TSH is suppressed by oral levothyroxine therapy (8,13,575). Current American Thyroid Association and European Thyroid Association guidelines for managing patients with DTC state that an “undetectable” basal and recombinant human TSH (rhTSH) stimulated Tg should be used as a parameter for the absence of disease (34,38,516). However, Tg “detectability” merely relates to assay functional sensitivity and in this context should not be used as a clinical benchmark. Clearly, patients with an undetectable serum Tg by 1st generation assay often have a detectable Tg in the 0.10 – 0.99 µg/L range when measured by a 2nd generation assay (546,547,554,573). Furthermore, a growing number of studies have now reported that the use of the more sensitive 2nd generation assays overcomes the need for expensive rhTSH-stimulated Tg testing [Section 6b(iv)] (8,547,554,557,572,573,576-581). Specifically, a basal (non TSH-stimulated) Tg measurement below 0.10 µg/L predicts a negative rhTSH test (rhTSH-stimulated Tg below 2.0 µg/L) with a high degree of confidence (557,581). However, it is important to recognize that the use of a 2nd generation Tg assay does not eliminate the need for periodic ultrasound examinations, because many lymph nodes containing DTC do not secrete enough Tg into the general circulation to be detected even when using a 2nd generation assay (34,516,573,582-584).

(c) “Hook” Problems

Tumor marker tests employing IMA methods are especially prone to the so-called “high-dose hook effects” (585-588). This phenomenon is characterized by the finding of inappropriately low values in sera when high analyte concentrations are present. The problem is caused by the high concentrations of analyte in the test specimen overwhelming the binding capacity of the solid-phase capture antibody. This phenomenon in turn reduces the ability of endogenous analyte to form a bridge between the two antibodies of the sandwich assay method thereby resulting in an inappropriately low signal (16,561,585,589,590). Manufacturers are generally aware of this problem and have attempted to overcome it by designing assays using a two-step process (586). However, when using any particular IMA method, it is primarily the laboratory’s responsibility to determine whether such a hook effect is likely to generate a falsely normal or low value (false-negative) for any particular serum specimen. Alternative approaches for both detecting and overcoming such hook effects occurring with IMA methods are:
• To routinely run each specimen at two dilutions. For example, the value obtained with a 1/5 or 1/10 dilution of the test serum would, if a hook effect were present, be higher than that obtained with an undiluted sample.
• To carry out appropriate dilution studies to rule out a possible hook effect when an unexpectedly low serum Tg value is encountered for a patient with known metastatic disease. In such cases, consultation with the physician may provide valuable information regarding this issue.
• To perform a Tg recovery test. If there is a hook effect present, the recovery of added antigen (Tg) will produce an inappropriate result.

(d) Interferences

(i) Human Anti-Mouse Antibody Interference (HAMA)
As with other immunometric assays, HAMA interferes with Tg IMA but not Tg measurements made by RIA (525,547). HAMA interference is thought to reflect inappropriate binding of the murine-derived monoclonal antibodies used as reagents in IMA methods, as opposed to the rabbit polyclonal antibodies typically used for RIA. In most cases interference is characterized by a false positive result (548,556,558,560), however, false negative interferences caused by HAMA have also been reported (559).

(ii) Tg Autoantibody (TgAb) Interferences
Tg antibodies are found in the sera of approximately 20 % of DTC patients (466,518,520,521,591-594), twice the prevalence of the general population (289). The presence of TgAb represents a major technical hurdle because these autoantibodies are capable of producing varying degrees of Tg assay interference (8,13,466,518,591,595). This fact necessitates that all sera sent for Tg measurement should be prescreened for the presence of TgAb (13,34,516). Such screening is essential because even very low serum TgAb concentrations can produce significant interference with Tg measurement. Moreover, because a patient’s TgAb status may change over time, such prescreening must be performed on all test samples (8,13,466,518,520,591,595-597). Current guidelines recommend screening for interfering TgAb using a TgAb immunoassay rather than an exogenous Tg recovery test because Tg recoveries have been shown to be unreliable for detecting interfering TgAb (8,13,16,34,516,598).

TgAb in the circulation interferes with serum Tg measurements in a qualitative, quantitative and method-dependent manner (466,599,600). Interference with IMA methodology is always unidirectional (underestimation), whereas RIA methods have the potential to either under- or overestimating Tg depending on the affinity and specificity of the antibody reagents that affect the partitioning of the Tg tracer between Tg antibody reagent and the endogenous TgAb (8,16,466,591,599-604). Some Tg IMA methods have claimed to overcome TgAb interference by using monoclonal antibodies directed against specific epitopes not involved in thyroid autoimmunity (605). Although conceptually attractive, this approach does not appear to have overcome interferences in practice, possibly because less restricted TgAb epitopes are more often associated with thyroid carcinoma than with the autoimmune thyroid conditions (513,606).

Figure 7 shows the comparison of serum Tg measurements determined in TgAb-negative and TgAb-positive sera when measured by both IMA and RIA methods (8,547). Although all the methods were standardized against the same certified reference preparation CRM-457, significant biases were observed reflecting assay specificity differences. As seen in Figure 6b, serum samples containing TgAb typically displayed a detectable Tg RIA value while a lower or even undetectable serum Tg values when measured using IMA methods. In fact, this RIA/IMA discordance appears to serve as the most reliable and sensitive indicator of TgAb interference (563). The mechanism responsible for the inappropriately low IMA values most likely represents the inability of Tg complexed with TgAb from participating in the two-site reaction of IMA methods (600,607). In contrast, RIA methods report fewer inappropriately low or undetectable Tg values for TgAb-positive sera because they likely measure both free Tg and Tg complexed with TgAb (8,466,608). In accord with current guidelines, a serum Tg value should not be reported for a TgAb-positive specimen if measured by an IMA method (13). Unfortunately, many laboratories that are solely using IMA methods still report falsely undetectable serum Tg values for TgAb-positive patients, albeit with a cautionary comment. On the other hand, other laboratories are adopting what appears to be a much better approach to this problem by restricting the use of the IMA methods to TgAb-negative sera while employing RIA methodology for TgAb-positive sera. Clinical studies are needed to determine whether the new Tg LC-MS/MS methods are useful for monitoring serum Tg in TgAb-positive patients (12,17).

(iii) The Use of Serum TgAb as a Surrogate Tumor Marker for DTC Approximately 20 percent of DTC patients present with detectable TgAb as compared with ~11 percent of the general population (289,466,518,521,591). As to whether the presence of serum TgAb can serve as a marker for DTC preoperatively remains a controversial subject (609-612). However, there has been growing recognition that postoperatively, serial TgAb measurements can serve as a surrogate tumor marker (Figure 5) (13,466,502,520-522,524,591,613,614). Supporting this view is the finding that patients who have serum TgAb detected at the time of initial surgery and who are rendered disease-free, show a progressive decline in serum TgAb levels that become undetectable over a median period of three years (466,520,522,524,614). In contrast, patients with persistent/recurrent disease typically maintain detectable and often exhibit rising TgAb concentrations in association with tumor recurrences (466,518,520,522,591). Because serum TgAb tests significantly differ in both sensitivity and specificity, it is essential that serial TgAb measurements be performed employing the same method (8,23,465,466,503,505,511,615).

(b) The Use of Serial Serum Tg for Monitoring Patients with DTC

Over the past decade, the reported incidence of DTC has substantially risen as small thyroid nodules and micropapillary cancers have increasingly been detected, presumably due to the widespread use of ultrasound and other imaging techniques (616,617). Although most DTC patients appear to be rendered disease-free by their initial surgery, approximately 15 percent of patients may experience recurrences and approximately 5 percent die from disease-related complications (618,619). The majority of persistent/recurrent disease is detected within the first five years following surgery, however, recurrences may occur decades after initial surgery thereby necessitating long-term monitoring (618). Because most patients have a low pre-test probability for disease, it is critical that protocols for follow-up have a high negative predictive value (NPV) to eliminate unnecessary testing and a high positive predictive value (PPV) to identify the minority of patients with persistent/recurrent disease. Because diagnostic whole body radioiodine scans have now become recognized to be a relatively insensitive method for the detection of early tumor recurrence, serial serum Tg measurements performed in conjunction with ultrasound examination has evolved as the central strategy for monitoring patients for this purpose (34,38,516,583,620). The current technical issues concerning serum Tg assays, such as assay bias, functional sensitivity and interference require close physician-laboratory cooperation in order to optimize the use of serial serum Tg testing for this purpose [Section 6a].

Four principal factors appear to influence the interpretation of serum Tg concentrations: (1) The mass of differentiated thyroid tissue present (normal tissue + tumor); (2) The intrinsic characteristics of the tumor tissue that influence the ability to secrete Tg; (3) The presence of any inflammation of, or injury to, thyroid tissue, such as observed following fine needle aspiration biopsy, surgery, radioiodine therapy or thyroiditis; and (4) The degree of thyroid tissue stimulation of TSH receptors by TSH, hCG or TSAb (13).

Serum Tg measurement may be useful in the context of the following four phases of managing patients with DTC:

(a) Pre-operative Serum Tg

Current guidelines do not routinely recommend pre-operative serum Tg measurement (34,38,516). However, some investigators believe that such a pre-operative serum Tg (drawn before or more than two weeks after a FNAB) provides useful information as to the tumor’s intrinsic ability to secrete Tg when cancer is present, because some tumors, especially those containing the BRAF mutation display reduced expression of Tg protein (621). This obviously impacts the utility of using serial serum Tg measurements to serve as a tumor marker to detect cancer recurrence post-operatively (622,623). The principle rationale for such a pre-operative serum Tg determination is a follows: The average serum Tg concentration for a normal euthyroid adult approximates ~12 µg/L such that one gram of normal thyroid tissue would likely account for about 1-2 µg/L of serum Tg from a normal thyroid gland size of 10-20 grams. Such a relationship would presumably hold for patients presenting with DTC with serum TSH within the euthyroid reference range and no other intrinsic thyroid pathology that might independently influence Tg secretion (13,575). Approximately 50 percent of DTC patients have an elevated preoperative serum Tg. The highest preoperative serum Tg concentrations are seen in Follicular > Hurthle > Papillary (624). Some studies have shown that an elevated serum Tg, even decades prior to diagnosis, is a risk factor for thyroid malignancy (625-628). This suggests that most DTCs secrete significant amounts of Tg protein to an equal or greater degree than normal thyroid tissue and underscores its importance as a useful tumor marker (624). However, it also indicates that as much as one-third of DTC’s may be so-called “poor Tg secretors” when considering the mass of tumor present. In this latter case, the use of serial postoperative serum Tg as a tumor marker may prove to be a relatively insensitive method for the detection of early tumor recurrence (621,629). It also would underscore the importance of employing total thyroidectomy to reduce the background noise from Tg being secreted by surgical remnant tissue and the use of a more sensitive second generation Tg assay [Section 6a(ii)] combined with ultrasound examination in the management of such cases. Thus, a preoperative Tg may play a useful role in establishing the postoperative management plan for the individual thyroid cancer patient.

(b) Early post-operative period – First year

Because the TSH status of the patient exerts such a strong influence on Tg secretion and circulating Tg concentrations, it is important that the patient be promptly placed on adequate oral L-T4 or L-T3 suppression therapy following surgery for DTC to facilitate establishing a stable serum Tg baseline to serve as a reference for long-term monitoring for tumor recurrence. Serum Tg measurements performed even as early as 6 to 8 weeks after thyroidectomy have been shown to have prognostic value, in that the higher the postoperative Tg the greater the likelihood that persistent disease is present (562,577,630-637). Since the half-life of Tg in the circulation approximates 3 days, the acute injury-related Tg release resulting from the surgical procedure should largely resolve within the first two months. If thyroid hormone therapy (either L-T3 or L-T4) is initiated immediately post-operatively to prevent the rise in TSH, by 8 weeks the serum Tg should be approaching a nadir value reflecting secretion from the post-surgical normal thyroid remnant as well as any residual or metastatic thyroid tumor present. As most surgeons performing near-total thyroidectomy leave an approximately 1-gram normal remnant tissue (638), secretion from this remnant tissue should result in a serum Tg concentration ≤ 2 µg/L if serum TSH is not elevated (13). A recent study using a receiver operator curve (ROC) analysis indicates that a 6-week serum Tg of ~1.0 µg/L, measured during TSH suppression had a 98 percent negative predictive value (NPV) (although the positive predictive value (PPV) was only 43 percent) (562). However both NPV and PPV could be maximized by following trends in serum Tg measured over time in preference to using a single fixed Tg cut-off value as a risk factor for residual/recurrent disease (549,562,639,640). The finding that using a fixed Tg cut-off has a low PPV underscores the advisability of restaging using combined serum Tg and ultrasound evaluations during the initial 6 to 12 months following initial surgery (562).

(c) Long-term monitoring during L-T4 Therapy

Current evidence indicates that the higher the basal serum Tg seen post-operatively (while on L-T4 suppression therapy) the greater the likelihood that the patient will have persistent or recurrent disease present (562,577,630-636). If serum TSH is maintained with L-T4 therapy at modestly suppressed levels (0.1 – 0.4 mIU/L), changes observed in the serum Tg concentration over time will reflect changes in tumor mass. In this context, when a stable TSH status is maintained (using the same L-T4 dose) a rising serum Tg would be considered suspicious for tumor recurrence, whereas a declining Tg levels suggests the absence or regression of disease (34,516,549,562,639). In fact, it is the trend in serum Tg, measured over time and under constant TSH conditions, that has proven to have a higher diagnostic sensitivity than using a fixed serum Tg cut-off value, especially when measured using a sensitive 2nd generation assay (549,562,572,639,641-645). Assay imprecision can result from variability in reagents supplied by the manufacturer, differences between instrument calibrations and a myriad of less well-defined factors. One approach directed at mitigating between-run imprecision has been to concurrently measure (in the same assay) an archived serum specimen from the patient alongside the current specimen (570). By eliminating run-to-run imprecision, even small increases in serum Tg may be validated.
The archiving of unused sera facilitates concurrent re-measurement of a past with the current specimen – an approach that can be used to eliminate between-run errors and more reliably detect a small but clinically significant serum Tg trend (570).

(d) Serum Tg responses to TSH Stimulation

The degree of tumor differentiation determines the presence and density of TSH receptors that in large part is responsible the magnitude of the serum Tg rise with TSH stimulation (621,629,646). Although the rise in serum Tg in response to increases in endogenous TSH following thyroid hormone withdrawal is twice that seen following rhTSH administration (34,516,583,647,648), rhTSH has increasingly used to stimulate serum Tg because it lends itself to a standardized procedure while avoiding the side-effects of producing hypothyroidism (649). When using insensitive Tg methodology, rhTSH-stimulated serum Tg measurement was used when patients had an undetectable basal Tg below the functional sensitivity of 1st generation Tg tests (functional sensitivity ~ 1.0 µg/L) (34,38,516). Studies established a consensus rhTSH-stimulated serum Tg cut-off of 2.0 µg/L, measured 72 hours after the second dose of rhTSH, as a risk factor for disease having a higher NPV (> 95 percent) than a basal (unstimulated) Tg measurement (562,577,582,583,620,635,639,641,643,644,650). However, an rhTSH-stimulated Tg below 2.0 µg/L does not guarantee the absence of tumor (583,647,650). Furthermore, the reliability of any fixed rhTSH-Tg cut-off is negatively impacted by the variability in numeric values reported by different methods (8,554), differences in the dose of rhTSH delivered related to the absorption from the injection site, surface are and age of the patient (651-654) and most importantly, the TSH sensitivity of the tumor tissue (621,646,655). In fact, more recent studies using more sensitive 2nd generation assays (functional sensitivity ≤ 0.10 µg/L) have reported that a basal Tg value below 0.10 µg/L has a comparable NPV to a rhTSH-stimulated Tg of < 2.0 µg/L (547, 554, 557, 572, 573, 578, 579, 581). As seen in Figure 7a, a strong relationship exists between basal Tg and rhTSH-stimulated Tg values (547,557) . This dictates that the likelihood of the serum Tg rising above the customary cut-off of 2.0 ng/mL (µg/L) in response to rhTSH stimulation is directly proportional to basal Tg (Figure 7b). This has prompted a controversy regarding whether an rhTSH-stimulated serum Tg value provides any additional information over and above that of basal Tg measured by a 2nd generation assay (547, 554, 557, 573, 578, 579, 581). Another important caveat to rhTSH-stimulated Tg testing is the typically blunting or absence of a rhTSH-stimulated Tg response characteristic of TgAb-positive patients (547). One explanation offered for this phenomenon is that the failure of the typical ten-fold rhTSH response results as a consequence enhanced metabolic clearance of the Tg-TgAb complexes (526,604,656).

(e) Serum Tg Reference Ranges

Because most serum Tg testing is made for thyroidectomized DTC patients in whom the target TSH is determined based on patient-specific risk for recurrence, the euthyroid reference range of the assay is only relevant for assessing whether the preoperative serum Tg is elevated [Section 6b(i)]. As discussed in Section 6a(i), Tg methods can report 2 to 3 fold differences in numeric values due to differences in standardization and specificity, and this leads to different reference ranges for different methods (8,547,554). When evaluating a thyroidectomized DTC patient, the reference range of the assay should be adjusted relative to thyroid mass and TSH status of the patient (13).

Serum Tg should be detectable in all normal euthyroid subjects in whom there is no circulating TgAb that might interfere with its detection. Although normal Tg secretion by the thyroid gland appears, in part, to be dependent on the stimulatory action of TSH, the quantity of Tg normally secreted appears to vary widely as reflected by the wide normal serum Tg reference range varying approximately from 3 – 40 ng/mL (µg/L), when assays are standardized directly against the International Reference Preparation CRM-457. Despite this variability, individual serum Tg values of euthyroid normal subjects characteristically remain remarkably constant over time with within-person CV approximating 15 percent (360,564). However, the suppression of TSH by oral thyroid hormone therapy will predictably produce an approximately 50 percent reduction (575). Patients who have received a lobectomy and subsequently have been maintained on TSH suppression should be evaluated using a range of 0.75 – 10 ng/mL (µg/L), while the typical 1-2 gram thyroid remnant remaining after near-total thyroidectomy would be expected to produce a serum Tg below 2 ng/ml with a suppressed TSH. By this same line of reasoning, truly athyreotic patients would be expected to have no Tg detected irrespective of their TSH status (13).

(f) Tg Measurements made in FNA Needle Saline Washouts

Because Tg protein is tissue-specific, the detection of Tg in non-thyroidal tissues or fluids (such as pleural fluid) indicates the presence of metastatic thyroid cancer (542). Struma ovarii is the only (rare) condition in which the source of Tg in the circulation does not originate from the thyroid (657). Cystic thyroid nodules are commonly encountered in clinical practice, the large majority arising from follicular epithelium and the minority from parathyroid epithelium. A high concentration of Tg or parathyroid hormone (PTH) measured in the cyst fluid provides a reliable indicator of the tissue origin of the cyst that can often aid the decision for possible surgical management (542). Lymph node metastases are found in up to 50 percent of patients with papillary cancers and 20 percent of follicular cancers (658-660). High-resolution ultrasound has now become an important component of the protocols used for postoperative surveillance for recurrence (34,516). Although ultrasound characteristics are helpful for distinguishing benign reactive lymph nodes from those suspicious for malignancy, the finding of Tg in the needle washout of a lymph node biopsy has higher diagnostic accuracy than the ultrasound appearance (661-669). The current protocol for obtaining such samples recommends rinsing the FNA needle in 1.0 mL of saline and sending this specimen to the laboratory for Tg analysis. This procedure is now widely accepted as a useful adjunctive test for improving the diagnostic sensitivity of the cytological evaluation of a suspicious lymph node or thyroid mass (661-667). The FNA needle washout procedure has recently been extended to measure Calcitonin in neck masses of patients with primary and metastatic medullary thyroid cancer (670-672). In addition, when using ultrasound to evaluate the thyroidectomized patient, the measurement of PTH in an FNA needle washout can be useful in distinguishing lymph nodes from parathyroid tissue.

(c) Thyroid Specific mRNA used as a Tumor Marker

Reverse transcription-polymerase chain reaction (RT-PCR) has been used to detect thyroid specific mRNAs (Tg, TSHR, TPO and NIS) in the peripheral blood of patients with DTC (673-675). Initial studies suggested that circulating Tg mRNA might be employed as a useful tumor marker for thyroid cancer, especially in TgAb-positive patients in whom Tg measurements were subject to assay interference (676). More recently, this approach has been applied to the detection of NIS, TPO and TSH receptor (TSHR) mRNA (677-679). Although some studies have suggested that these thyroid specific mRNA measurements could be useful for cancer diagnosis and detecting recurrent disease, most studies have concluded that they offer no advantages over sensitive serum Tg measurements (680,681). Further, the recent report of false positive Tg mRNA results in patients with congenital athyreosis (682) suggests that Tg mRNA can arise as an assay artifact originating from non-thyroid tissues, or illegitimate transcription (683,684). Conversely, false negative Tg mRNA results have also been observed in patients with documented metastatic disease (685-687). Although Tg, TSHR, NIS and TPO are generally considered “thyroid specific” proteins, mRNAs for these antigens have been detected in a number of non-thyroidal tissues such as lymphocytes, leukocytes, kidney, hepatocytes, brown fat and skin (390,688-693). Additional sources of variability in mRNA analyses relate to the use of primers that detect splice variants, sample-handling techniques that introduce variability, and difficulties in quantifying the mRNA detected (680,685). There is now a general consensus is that mRNA measurements presently lack the optimal specificity and practicality to be useful tumor markers (680). Finally, the growing number of reports of functional TSH receptors and Tg mRNA present in non-thyroidal tissues further suggests that these mRNA measurements will have limited clinical utility in the management of DTC in the future (390,691-693).

7. Interferences with Thyroid Test Methodologies

It is difficult for the laboratory to proactively detect interference from a single measurement such as an isolated TSH test. It is more common for the physician to suspect assay interference when a reported value is inconsistent with the clinical status of the patient (694). In this context it is important to note that classic laboratory checks of analyte identity, such as dilution, may not always detect an interference problem (695). The most practical way to investigate a possible interference is to test the specimen by a different manufacturer’s method and check for discordance between the test results. This approach is effective because methods vary in their susceptibility to interfering substances (696). Occasionally, a biological check can be made using TRH-stimulation or thyroid hormone suppression to check a suspect TSH, or rhTSH stimulation to validate a suspect Tg (247,557). Interferences producing a falsely elevated TSH or Tg values will usually be associated with a blunted (<2-fold increase) to stimulation responses, or in the case of TSH less than the expected 90 percent suppression 48 hours following the oral administration of 1mg of L-T4 or 200µg L-T3 (247).

(a) Cross-reactivity Problems

The specificity of an immunoassay depends on the ability of the antibody reagent to discriminate flawlessly between the analyte and structurally related ligands (699). TSH assays are more likely to be affected by such cross-reactivity problems than thyroid hormone tests where chemically pure iodothyronine preparations are available for selecting antibody specificity. However the use of monoclonal antibodies for developing TSH IMA methods has virtually eliminated the cross-reactivity problems with other glycoprotein hormones (LH or hCG) that plagued the early TSH RIA methods (237). However, because each monoclonal antibody reagent differ in its specificity for recognizing various circulating TSH isoforms, these antibody differences can result in the reporting of TSH values that may be altered by as much as 1.0 mIU/L for the same serum specimen (334,335).

(b) Endogenous Antibodies

In 1956 Robbins and colleagues first reported an unusual thyroxine binding protein globulin in serum that proved to be the result of an autoantibody (18). Subsequently, both T4 and T3 as well as TSH autoantibodies have been identified in sera from patients with autoimmune thyroid and nonthyroid disorders [Section 3Dc] (700-705). Although there are numerous reports of anomalous total and free thyroid hormone as well as TSH test values resulting from T3, T4 or TSH autoantibodies, these autoantibodies now only rarely cause interference with current methods (191,406,704,707). When it does occur, such endogenous antibody interferences are characterized by either falsely low or falsely high values, depending on the type and composition of the assay method employed (703,708,709). In contrast, the common occurrence of autoantibodies directed against thyroglobulin (TgAb) still cause major problems with serum Tg measurement, as discussed in Section 7(e).

C. Non-Analyte related Antibodies (Heterophile antibodies)

(a) Heterophile Antibodies (HAb)/ Human Anti-Mouse Antibodies (HAMA)

HAb represents a group of relatively weak multispecific, polyreactive antibodies with specificity for poorly defined antigens that react with immunoglobulins derived from two or more species (170,191,706,710-712). Most frequently such HAb interferences result from IgM Rheumatoid Factor or Human Anti-Mouse Antibodies (HAMA) (711). Immunometric assay (IMA) methods that use monoclonal antibodies of murine origin are more prone to HAMA interference than competitive immunoassays because HAMA in the specimen is able to form a bridge between the capture and signal antibody reagents and creates a false signal that is reported as a falsely high value (556,713). Even the use of chimeric antibodies does not appear to overcome this form of antibody interference inappropriate value may not necessarily be abnormal but may be inappropriately normal relative to clinical status (714-716). Although in most cases interference is detected as a false positive result, false negative interferences have been reported for Troponin 1 and Thyroglobulin (556,558,717). Failure to recognize such interferences can result in unnecessary additional diagnostic testing or even unnecessary surgeries (718). Since antibodies cross the placenta, these HAb’s have the potential to interfere with the diagnostic accuracy of neonatal screening tests (719). Although the prevalence of interfering antibodies can be as high as 30-50 percent, manufacturers typically develop their assays to include reagents that block or neutralize most interferences (561,720-722). The prevalence of such interference is difficult to estimate employing present methodologies but it is currently estimated to range between 0.03 and 3.0 percent (556,721). Despite the measures used by manufacturers to neutralize interferences due to HAMA, this type of interference continues to be infrequently encountered in clinical practice, especially when patients have been exposed to animal proteins through vocational, diagnostic or therapeutic routes. Because the potential for interference is unique to a particular patient’s specimen, such interference will not be identified by the laboratory’s routine quality assurance checks. Thus, both the clinician and the laboratory must be aware of this possibility when an apparently inappropriate test result is encountered. Although assays for HAMA detection have been developed, inter-method differences are so large that these tests are not considered to be reliable enough for interference screening (723,724).

(ii) Drugs affecting TT4 and FT4 immunoassays
A number of drugs cause hypothyroxinemia in euthyroid patients by decreasing TBG concentrations (Androgens, Niacin); decreasing T4 binding to TBG (high dose Salicylates and Phenytoin); and/or increase T4 metabolism (Carbamazepine, Phenobarbitol and Phenytoin) (70). Other drugs may cause hyperthyroxinemia in euthyroid patients by increasing TBG (Clofibrate, Estrogen, 5FU, Heroin/Methadone) (27,725,726). Other drugs raise circulating T4 levels by inhibiting T4 to T3 conversion (Amiodarone, Iopanoic acid and Propranolol). In patients receiving heparin treatment there can be a transient increase in T4 as a result liberation of lipoprotein lipase from vascular endothelium resulting in increased concentrations of non-esterified fatty acids thereby displacing thyroid hormones from TBG. This process can also occur in vitro during sample storage at ambient temperature and is accentuated at low albumin concentrations (27,138,726,733). Therefore, total T4 and T3 measurements are more reliable tests than FT4 and FT3 for assessing thyroid hormone status in patients receiving heparin therapy (726). In certain pathologic conditions such as uremia, abnormal serum constituents such as indole acetic acid may accumulate and interfere with thyroid hormone binding (734). Thyroid test methods employing fluorescent signals may be sensitive to the presence of fluorophor-related therapeutic or diagnostic agents in the specimen (735) (736,737).

(iii) Drugs affecting thyroid function
Many commonly prescribed medications interfere with thyroid function or thyroid test results (70). Drugs that can cause hypothyroidism include (Aminoglutethamide, Amiodarone, Cytokines (Interferon, IL-2, TNF, TgF), Lithium, Iodine-containing agents and Retinoids/Vitamin A) (70,738,739). Under other circumstances Amiodarone, Iodine-containing agents and Cytokines induce thyroiditis and hyperthyroidism in some patients (70,489,740,741). Patients with any pre-treatment history of a thyroid condition or positive thyroid antibodies are also more prone to develop thyroid dysfunction some time after initiating such drug therapies (742).

8. Automation of Thyroid Tests

Radioimmunoassay methods are difficult to automate since they require a physical separation of antibody-bound from free tracer. However, once homogeneous methods based on monoclonal antibodies were developed, significant progress was made in automating thyroid test immunoassays. The current trend in automation is geared towards high-throughput, modular, robotic systems that incorporate both immunoassay and clinical chemistry analyzers into one instrument (266,743,744). Most recently, liquid chromatography-tandem mass spectrometry (LC-MS/MS) methods have been developed to measure total and free thyroid hormones as well as Thyroglobulin. These techniques however are technically complex and cannot be automated because they involve specimen pretreatments (11,12,17,33,745). Tests for TT4, TT3, THBR, TSH, Tg, TPOAb and TgAb using non-isotopic (primarily chemiluminescent) signals are currently available on a variety of immunoassay analyzer platforms that employ bar-coding, multiple-analyte random-access, primary tube sampling, autodilution, STAT testing and computerized data output (25,744,746,747). Laboratories primarily select an analyzer to perform thyroid testing on the basis of instrument menu and operating costs, and only secondarily recognize that there are differences in the functional performance of different methods. Although the move to automation is seen as cost-effective, the consolidation of a diversity of immunoassay tests onto one platform has led to a transfer of thyroid testing from small, specialized laboratories to the general chemistry laboratory setting. This centralization has resulted in a loss of laboratory expertise for the clinically interpretation of thyroid tests. This has negatively impacted the ability of laboratory staff to discuss reasons for discordant test results with physicians. Current trends towards point-of-care testing using miniaturized and biosensor technology would appear to bring laboratory personnel closer to the patient (748,749). However, near-patient testing is only more cost-effective when immediate diagnosis and therapy reduces hospitalization costs (748). Since most thyroid disease is treated on an outpatient basis, point-of-care thyroid tests are unlikely to replace centralized automated thyroid testing.

125. Hamolsky MW, Golodetz A, Freedberg AS 1959 The plasma protein-thyroid hormone complex in man. III Further studies on the use of the in vitro red blood cell uptake of I131-triiodothyronine as a diagnostic test of thyroid function. J Clin Endocrinol 19:103-?