A number of qualitative and quantitative ultrasound (US) risk stratification systems for thyroid nodules are being used all over the world, new ones are being devised but so far, no consensus on a single system has emerged. Efforts by the worldwide medical community involved in the management of thyroid nodules are converging toward US risk stratification systems, which could provide a high-sensitivity and high-negative predictive value (NPV) for the diagnosis of clinically significant thyroid carcinomas. In this article, we review the fascinating journey of thyroid US-based thyroid imaging reporting and data system (TIRADS), the changing trends in TIRADS and emerging stratification systems to assess the risk of malignancy. Our recommendation is to develop a comprehensive system of risk stratification which incorporates clinically relevant as well as radiological risk factors and aims to accurately predict the risk of malignancy and oncologic outcome for each patient.

Thyroid cancer is the most common endocrine malignancy, accounting for ~2.1% of all cancer diagnoses worldwide, with ~77% of these diagnoses occurring in women [1] Approximately 90% of all thyroid cancers are differentiated, out of which papillary thyroid carcinoma (PTC) is the most common histological type of differentiated thyroid cancer (DTC), followed by follicular thyroid carcinoma.[2] Since 1970s, rapidly rising incidence rates and comparatively stable mortality for thyroid cancer have been reported throughout the world, including the USA, Canada, Europe, Australia, and Asia.[3] Worldwide trends in thyroid cancer incidence have been largely driven by an increase in PTC as opposed to other major histological types.[3] The explanation for the increased incidence of DTC could be multifactorial; however, some researchers have concluded that the incidence patterns for thyroid cancer are largely attributable to overdiagnosis.[3],[4]

The introduction of ultrasound (US) in the 1980s and its subsequent use in conjunction with fine-needle aspiration (FNA) biopsy has led to a dramatic upsurge in the detection of small thyroid nodules and diagnosis of thyroid cancer at an early stage.[5],[6] These technological changes undoubtedly account for at least some of the increase in the incidence of thyroid cancer since the 1990s.[5],[6],[7] The widespread availability and outstanding efficacy of ultrasonography for the assessment of the thyroid gland makes it invaluable not only for thyroid nodule assessment but also to guide diagnostic interventions such as FNA and biopsies. These qualities make US the cornerstone for risk stratification of thyroid nodules and to facilitate optimal decision-making in the management.[6],[7]

During last few decades, several risk stratification systems have been introduced with the aim of detecting the highest possible risk of thyroid malignancy, to be able to assert benignancy when relevant, to help select which patients should undergo FNA and/or surgery, and to reduce the number of unnecessary invasive procedures for benign nodules [7][Table 1] and [Table 2]. In this article, we review the fascinating journey of thyroid US, the changing trends in thyroid nodule imaging and emerging stratification systems to assess the risk of malignancy.

In 2005, the Society of Radiologists in Ultrasound (SRU) made the first notable attempt to outline recommendations for the management of thyroid nodules identified at thyroid ultrasonography. They sought to lay down clear guidelines to determine which thyroid nodules required to undergo US-guided-FNA and the ones that would not require any intervention, with aim of facilitating the early diagnosis of malignancy and at the same time avoiding unnecessary investigations for benign nodules.[8] They identified certain US characteristics of thyroid nodules which could aid nodule characterization, chief among them being microcalcifications, associated with high specificity and positive predictive value (PPV) for malignancy and certain features such as hypoechogenicity and solid composition, with a high NPV; the absence of which decreases the possibility of malignancy [9],[10][Table 2]. They also laid guidelines for performing thyroid nodule FNA that was based on its size and composition; solid nodules measuring >1 cm being considered high-risk nodules, for which FNA was deemed necessary.[8] Although some clarity with regard to thyroid nodule management was certainly achieved, certain issues remained to be addressed. The management of multiple nodules was not adequately dealt with. Moreover, the US features which could preclude FNA with certainty were not outlined.

Conceptualization of Risk Stratification for Thyroid Nodules (2007)

SRU guidelines lead to the genesis of the risk stratification systems, and the first-risk stratification grading systems came into being in year 2007 in Japan and Korea.[11],[12] Ito et al. were the first to introduce US-based categorization of thyroid nodules and the US classification system reported by the author was in use for mass screening of thyroid nodules over a period of 10 years in Kuma hospital, Japan. This classification had five grades ranging from 1 to 5 with class 5 comprising of the most suspicious nodules having solid composition, irregular-shape and extrathyroid extension [11][Table 3]. Furthermore, subcategories from Class 2 to Class 5 were also assigned. The designated classes being: class 2.5 (nodule with cystic change but shape being partially irregular and/or strong echoes internally or at the capsule), Class 3.5 (the shape of a solid nodule in part irregular), and Class 4.5 (a solid and irregular-shaped nodule with minor extrathyroid extension). A score <2.5 was considered as benign, 3–3.5 as intermediate risk and >3.5 being highly suspicious nodules.[11] The limitation of the Ito et al. risk stratification system was that it did not incorporate all the high-risk features (relied heavily on composition and margins), was subjective and complicated.

Tae et al. were the second group to introduce US-based categorization of thyroid nodules. They included the following US features: microcalcifications, an irregular or microlobulated margin, marked hypoechogenicity, and taller than wide shape [Table 3]. The presence of one or more of these features was categorized as Category 3 (malignant). The absence of all of these features was labeled as Category 2 (benign), and presence of an anechoic cystic nodule was classified as Category 1 (benign).[12] Even though the Tae et al. risk stratification system was simplistic, it had the drawback that a large percentage of nodules were categorized as high risk (category 3) and were subjected to FNA.

Thyroid Imaging Reporting and Data System (2009)

Taking into account, the high prevalence of thyroid nodules and their diverse US patterns, which posed a challenge for accurate characterization, Horvath et al. in 2009, proposed an evaluation system for thyroid nodules called thyroid imaging reporting and data system (TIRADS), based on similar lines as the Breast Imaging RADS (BI-RADS,).[13] Based on results obtained after an 8-year, prospective study of about 1959 thyroid nodules, they described ten-specific US patterns that encompassed all types of thyroid lesions and assigned them into six TIRADS categories ranging from TIRADS 1 (normal thyroid) to TIRADS 6 (biopsy-proven malignancy). The TIRADS classification essentially serves as a tool to select high-risk nodules for FNA [Table 3]. Nodules classified under TIRADS 2 and TIRADS 3 are deemed to have a low risk of malignancy and do not warrant FNA, thereby reducing the number of unnecessary interventions [Table 4].[13],[14] The introduction of TIRADS was certainly a breakthrough in risk stratification of thyroid nodules. It significantly improved the diagnostic efficacy of US imaging, and the lexicon soon became the standard of practice across the world; however, a major limitation of the classification system was that not all the US features of nodules proposed by Horvath et al. could be applied with certainty in daily clinical practice, as the stereotypic application of the US patterns were tedious to apply, especially in nodules <10 mm.[15]

Later, during the same year, the Park et al. study in 1,694 patients redefined TIRADs and added two more US features to the existing classification system. These were solid composition with mildly hypoechoic echotexture and the presence of suspicious lymph nodes. They formulated a mathematical equation that comprised of twelve US parameters and nodules were stratified on a 5-point scale. FNA was recommended for nodules that had a score of TIRADS 3 and 4 and surgery for TIRADS score 5. However, the study being a retrospective one had the limitation of validation of its diagnostic efficacy.[16]

Thyroid Imaging Reporting and Data System Kwak Et Al. (2011)

In 2011, Kwak et al. sought to develop a more simplified, practical, and convenient TIRADS that was suitable for application in routine clinical practice, by demonstrating that risk stratification of thyroid malignancy could be accomplished according to the number of suspicious features on US.[17] They introduced a new quantitative model in which each individual US feature was assigned a risk score according to its odd's ratio for predicting the likelihood of malignancy. The risk of malignancy in thyroid nodules increased in parallel with the calculated total score (sum of the score for each individual feature). Based on these findings, they created TIRADS Categories 1, 2, and 3 (with no suspicious US features), 4a (one suspicious US feature), 4b (two suspicious US features), 4c (three or four suspicious US features), and 5 (five suspicious US features) using the risk of malignancy from the BI-RADS categorization [Table 3] and [Table 4].[18] Certain limitations did remain, such as the lack of inclusion of thyroid nodule vascularity pattern on color Doppler US and the cervical nodal status. In addition, they did not assess the risk of cancer in patients with solitary nodules versus those with multiple nodules. Moreover, they did not lay clear guidelines for the diagnostic workup (need to FNA or follow-up) of TIRADS 4a nodules (low risk of malignancy).[17],[18]

French Thyroid Imaging Reporting and Data System (2011–2013)

From the years 2011–2013, a research group from France introduced the French version of TIRADS, essentially a five-tier system, which included a standardized vocabulary and reporting format with a system for quantified risk assessment. They created an e-atlas, which described the various US features of thyroid nodules, with definitions and illustrations for reference. They also introduced a structured and standardized reporting format and finally tested the diagnostic accuracy of their system on 4550 nodules in a two-phase study (early paper included 500 nodules were published in 2011 in French language).[19],[20],[21] They simplified the risk stratification system by reducing the number of subcategories to enhance interobserver agreement and ease of utility.

Some noteworthy additions to the risk stratification system were the inclusion of stiffness of the nodule on elastography (ES) and the presence of suspicious lymph nodes to classify nodules into Categories 4b and 5 (high risk of malignancy) [Table 3] and [Table 4]. Doppler imaging was found to have poor interobserver agreement in thyroid nodule evaluation and was thus not included in the stratification system.[22] They took a step further to facilitate ease of communication between the radiologist and clinicians by laying down clear guidelines for needle aspiration based on the size of the nodule and the TIRADS score.[20],[21] However, the French TIRADS is thought have the limitations in terms of its routine clinical efficacy when used by less experienced operators.[22]

British Thyroid Association Guidelines (U System, 2014)

In 2014, the British Thyroid Association (BTA) published new guidelines for the management of thyroid cancer with specific reference to the US features and emphasis on the role of US to guide decision-making for nodules that require FNA.[23] They introduced an US Scoring System termed as the “U classification system” in which thyroid nodules can be classified into diagnostic categories based on specific US features, such as echogenicity, type of calcifications, shape, vascularity, and presence of suspicious lymphadenopathy. Accordingly, nodules were classified as benign (U2), indeterminate (U3), suspicious (U4), and malignant (U5) [Table 3] and [Table 4]. FNA is only recommended for categories U3 and above. They also laid down recommendations for the management of nodules detected incidentally during CT or PET scans. Implementation of the BTA guidelines was shown to bring about significant improvement in radiology reporting and additionally was found to be easy to apply in routine clinical practice.

Application of the U classification system in the US assessment of thyroid nodules was expected to potentially provide a huge financial benefit to hospital trusts by reducing the number of unnecessary invasive tests.[24] However, there were some areas where the guidelines were found to be controversial that limited its applicability in day-to-day practice. The U system has few drawbacks, such as the use of nodule vascularity pattern (internal/mixed) as a high-risk feature for malignancy, which however is currently regarded as a nonspecific feature in recently published literature. The other drawback is that elastography (ES) has not been incorporated into the U classification, despite its role as a supplementary tool for nodule characterization, especially in indeterminate cases (U3 or U2/U3), which can influence the decision for FNA.

Ata Guidelines (2015)

The ATA guidelines were recently updated in the year 2015, with the aim of apprising clinicians and policymakers about optimum clinical decision-making in the management of thyroid nodules. The ATA guidelines aim to minimize potential harm from overtreatment in patients at low risk for disease-specific mortality and morbidity while appropriately treating those at higher risk.[2] The ATA guidelines advocate that thyroid/neck US should be performed in all patients with a suspected thyroid nodule. Thyroid US must first confirm whether there is truly a nodule; and if present, the guidelines emphasize that a clear description of the nodule size, its location and its sonographic features, namely, the composition, echogenicity, margins, presence and type of calcifications, and shape, is imperative to classify it into one of the five sonographic patterns as outlined, namely, high suspicion, low suspicion, intermediate suspicion, very low suspicion, and benign. The pattern of sonographic features confers the risk of malignancy, and in conjunction with nodule size, guides FNA, and decision-making [Table 3] and [Table 4]. The ATA also addresses certain issues which were not adequately dealt by risk stratification systems in the past, such as

Which nodule(s) should be subjected to FNA

How to approach multinodular thyroid

What should be the follow-up for nodules that do not meet FNA criteria

What should be the follow-up of nodules with benign FNA cytology?

On a thyroid US, a nodule is classified into one of the five categories: benign pattern (0% risk): no biopsy; very low suspicion pattern (<3% risk): Biopsy if ≥2 cm (or US observation); low suspicion pattern (5%–10% risk): Biopsy if ≥1.5 cm; intermediate suspicion pattern (10%–20% risk): Biopsy if ≥1 cm; high suspicion pattern (>70%–90% risk): Biopsy if ≥1 cm.[2] Thus, for nodules classified under the categories of high and intermediate suspicion, FNA is deemed necessary if the size exceeds 1 cm. On the other hand, nodules with a very low index of suspicion require FNA only when they are larger than 2 cm in size. The high cutoff is justified as follicular cancers as well as follicular variants of papillary carcinomas tend to present with low US risk features but run an indolent course, with a low probability of distant metastases.[25] The guidelines also recommend that multiple thyroid nodules ≥1 cm should be evaluated in similar fashion as patients with a solitary nodule ≥1 cm. Current guidelines recommend FNA of sonographically suspicious cervical nodes if detected during an US examination. In nodules with radiological and pathological discrepancy (high-risk US features with benign FNA); if a nodule has undergone repeat US-guided FNA with a 2nd benign cytology, US surveillance is not recommended.[2]

In conclusion, the ATA guidelines concede that a less aggressive approach should be adopted in select populations, such as patients with nodule size <1 cm and nodules with very low-risk nodules, in whom the benefits of intervention may be unrealized. However, the risk stratification system still holds the drawback of being qualitative in nature that could potentially lead to subjectivity in reporting the findings and categorization of the nodule.

The first recommendations for the US-based diagnosis and management of thyroid nodules by the Korean Society of Thyroid Radiology (KSThR) were published in 2011.[26] Later in year 2015/2016, the KSThR researchers implemented a four-tier quantitative stratification system Korean TIRADS (K-TIRADS) based on the overall estimated risk of malignancy in thyroid nodules using a combination of primarily nodule echogenicity, solidity, and certain other suspicious US features.[27],[28],[29],[30] The Korean Thyroid Association (KTA) coined K-TIRADS differed from the existing lexicons by stratifying nodules primarily by echogenicity for the estimation of malignancy risk and management decision-making.[28],[29] They have categorized thyroid nodules into four tiers based on the risk of malignancy in each: category 1, benign (0%); Category 2, probably benign (≤5%); Category 3, indeterminate (>5%, ≤50%); and Category 4, suspicion of malignancy (>50%). In Category 4 nodules, the risk of malignancy increased in parallel with the number of suspicious US features. The value of the composition (mixed, solid, and liquid) in risk assessment was also elucidated [28],[29],[30][Table 3] and [Table 4].

KTA K-TIRADS 2016 revision has “flexibly” and “selectively” adopted the 2015 ATA guidelines.[31] The significant differences between the K-TIRADS and 2015 ATA guidelines are that (1) in K-TIRADS, partially cystic or iso/hyperechoic nodules with any suspicious features are being classified as intermediate risk (Category 4) and (2) nodules that have very low risk and the ones the benign group in the ATA guidelines are categorized as benign (Category 2) in the present K-TIRADS. The four-tier risk categorization system has also modified the decision to perform aspiration based on the size and risk of malignancy. The K-TIRADS recommend FNA for nodules up to >0.5 cm if associated high-risk findings are present such as extrathyroidal extension, pathological cervical lymph node, distant metastasis, trachea or recurrent laryngeal nerve invasion, and tumor progression. Overall, the Korean guidelines have adopted a more conservative approach to the diagnosis and treatment of DTCs, favoring active surveillance for papillary thyroid microcarcinomas rather than unnecessary interventions; however, it still remains a qualitative system.[31]

American College of Radiology-Thyroid Imaging Reporting and Data System (2015–2017)

In 2015, committees convened by the American College of Radiology (ACR) presented an approach to incidental thyroid nodules and proposed standard terminology for US reporting.[32],[33],[34] The risk stratification system was designed to identify most clinically significant malignancies while reducing the number of biopsies performed on benign nodules. In the year 2017, this system paved the path for the creation of a quantitative approach to risk stratification that differed greatly from the existing classification systems which incorporated a qualitative approach. Thus, according to the current ACR-TIRADS system, points are assigned for all the US features in a nodule, in increasing order of suspicion. The US features are grouped under five lexicon categories.[24] When assessing a nodule, a single feature will have to be selected from each lexicon and the sum of the points for each feature will then determine the nodule's ACR TIRADS level, ranging from TR1 (benign) to TR5 (high suspicion of malignancy) [Table 3] and [Table 4]. To simplify matters, subcategories are not included in the ACR TIRADS system. The assessment of cervical adenopathy is regarded an essential part of the sonographic examination but is not assigned a score. The recommendations for FNA or US follow-up are based on nodule's ACR TIRADS level and its maximum diameter.[34]

The current ACR TIRADS is specifically designed to balance the benefit of identifying clinically important cancers against the risk and cost of subjecting patients with benign nodules or indolent cancers to biopsy and treatment, thus fostering the need for active surveillance for low-risk thyroid cancer rather than aggressive interventions. Based on this approach, the committee advocated higher thresholds for mildly and moderately suspicious nodules as compared to the ones laid down by other associations such as the ATA and the KSThR.[22]

In 2017, Mahajan et al. group from India, introduced a quantitative algorithm for characterising thyroid nodules and have proposed “Thyroid Multimodal-imaging Comprehensive Risk Stratification Scoring – (TMC-RSS),” a comprehensive risk stratification system based on US features in combination with Color Doppler (CD), TIRADS, ES, and cervical nodal status [35][Table 3] and [Table 4]. It implemented a unique scoring system, generated from a retrospective cohort of 650 thyroid nodules (development dataset: 318 nodule data published in year 2017 and data of 650 nodules presented at the Australian and New Zealand Head and Neck Cancer Society 19th Annual Scientific Meeting), with positive scores assigned to suspicious features and negative scoring for benign US features.[35],[36] The final TMC-RSS score was calculated by the summation of all these points [Table 5]. Accordingly, the cumulative risk of malignancy was calculated based on TMC-RSS scores which were categorized into a three-tier system. Group 1: scores <3 were associated with a low risk of malignancy (<2.4%), Group 2: score ≥3–<6 had an intermediate risk (<18%), and Group 3: score ≥6 were associated with the highest risk (>80%).[35],[36] The main strengths of TMC-RSS system are the inclusion of the conventional US criteria in conjunction with supplementary features such as ES and cervical nodal status which greatly influence thyroid nodule characterization, especially in equivocal clinical scenarios. Hence, the TMC-RSS system takes the best of both 2015 ATA and K-TIRADS 2016 revised guidelines. Furthermore, the incorporation of negative scoring for benign features enables the interpretation of TIRADS with greater certainty. TMC-RSS also imbibes the inherent advantages of a purely quantitative scoring system that eliminates interoperator/observer reporting variability and significantly enhances the ease of utilization, especially in lesser experienced operators, which have been the limitation of both K-TIRADS and ATA guidelines. The TMC-RSS system also provides a certain degree of flexibility to the operator with regard to the use of ES which may or may not be included as part of the examination, depending on its availability (scoring system excluding ES are also provided).

The TMC-RSS scoring system is intended to serve as an easy, reproducible, and robust method of risk stratification, thereby facilitating effective interdisciplinary communication and efficient decision-making.[35] However, the results of prospective validation dataset of this study are awaited; these would in all certainty vouch for the diagnostic efficacy of the scoring system.

Validation of Three Scoring Systems (2017)

Published in year 2017, Ha et al. performed a study to validate the existing three scoring risk stratification models estimation for thyroid nodules using ultrasonography features, namely, a web-based malignancy risk stratification system developed by Choi et al., those developed by the KSThR (Kwak et al.) and the (ACR TIRADS 2017)[2],[17],[37],[38] (http://www.gap.kr/xe/Estimation). They evaluated US features of thyroid nodules such as the internal content, echogenicity of the solid portion, shape, margin, and calcification, in a cohort of 954 patients (year 2013–2014). They categorized the US features in each patient, according to the respective scoring model definitions and used an online automatically calculated scoring system for malignancy risk stratification under each system. Validation of the models was performed separately by measuring the discrimination and calibration abilities for each. Their validation study revealed that the web-based, the Korean model, and ACR scoring risk stratification models showed acceptable predictive accuracy for identification of malignancy; however, the web-based scoring system, in particular, showed the highest agreement in calibration ability. Furthermore, the web-based scoring system could yield rapid results, and its online automatically calculated system could also tide over complexities of the previous scoring risk stratification models. In addition, the web-based system was also found to have superior discrimination ability, believed to be due to the involvement of multiple diagnostic centers in the design of this web-based predictive model, thereby generating results that are more reproducible across a wider population.[37],[38] Thus, with the validation of the superior efficacy of the web-based system, it is likely that future implementation of this risk stratification model will simplify clinical decision-making, guide personalized management, and reduce analysis time.[38]

Conclusion

A number of qualitative and quantitative US risk stratification systems for thyroid nodules are being used all over the world, new ones are being devised, but so far, no consensus on a single system has emerged. Efforts by the worldwide medical community involved in the management of thyroid nodules are converging toward US risk stratification systems, which could provide a high sensitivity and high-negative predictive value for the diagnosis of clinically significant thyroid carcinomas. In addition, it is essential to formulate a reporting lexicon which is standardized across the globe so as to permit effective interdisciplinary communication for appropriate management of thyroid cancers, and for this, we propose a synoptic reporting US format [Figure 1]. As we enter the era of personalized medicine, the need of the hour is to develop a comprehensive system of risk stratification which incorporates clinically relevant as well as radiological risk factors to provide a holistic view of the true risk of malignancy and to predict the oncologic outcome for each patient.