Combined model integrating clinical, radiomics, BRAFV600E and ultrasound for differentiating between benign and malignant indeterminate cytology (Bethesda III) thyroid nodules: a bi-center retrospective study
Original Article

Combined model integrating clinical, radiomics, BRAFV600E and ultrasound for differentiating between benign and malignant indeterminate cytology (Bethesda III) thyroid nodules: a bi-center retrospective study

Lichang Zhong1 ORCID logo, Lin Shi1, Jinyu Lai1, Yuhong Hu2, Liping Gu1

1Department of Ultrasound in Medicine, Shanghai Sixth People’s Hospital Affiliated to Medical College of Shanghai Jiao Tong University, Shanghai Institute of Ultrasound in Medicine, Shanghai, China; 2Department of Ultrasound in Medicine, Tinglin Hospital, Tongji Medical Group, Shanghai, China

Contributions: (I) Conception and design: L Zhong, L Gu; (II) Administrative support: L Gu; (III) Provision of study materials or patients: All authors; (IV) Collection and assembly of data: All authors; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Liping Gu, MM. Department of Ultrasound in Medicine, Shanghai Sixth People’s Hospital Affiliated to Medical College of Shanghai Jiao Tong University, Shanghai Institute of Ultrasound in Medicine, No. 600, Yishan Road, Xuhui District, Shanghai 200233, China. Email: guliping666@126.com; Yuhong Hu, MB. Department of Ultrasound in Medicine, Tinglin Hospital, Tongji Medical Group, No. 80, Siping North Road, Tinglin Town, Jinshan District, Shanghai 201505, China. Email: 936116004@qq.com.

Background: The management of thyroid nodules diagnosed as Bethesda III by fine-needle aspiration presents certain challenges, and there is an urgent need for a non-invasive and accurate method for early identification of the benign or malignant nature of Bethesda III nodules. Our objective is to develop and validate a clinical-radiomics nomogram based on preoperative ultrasound (US) images and clinical features, for predicting the malignancy of thyroid nodules with indeterminate cytology (Bethesda III).

Methods: Between June 2017 and June 2022, we conducted a retrospective study on 274 patients with surgically confirmed indeterminate cytology (Bethesda III) across two separate medical centers in Shanghai. The training and internal validation sets were comprised of 136 and 58 patients, respectively, all sourced from Shanghai’s Sixth People’s Hospital. To facilitate external test, a further 80 patients were selected from Tinglin Hospital. Utilizing preoperative US data, we obtained imaging markers for radiomic features. After feature selection, we developed a comprehensive diagnostic model to evaluate the predictive value for Bethesda III benign and malignant cases. The model’s diagnostic accuracy, calibration, and clinical applicability were systematically assessed.

Results: The results showed that the prediction model, which integrated US radiomics, and clinical risk features, exhibited superior stability in distinguishing between benign and malignant indeterminate thyroid nodules (Bethesda III). In the external test set, the area under the curve (AUC) was 0.824 [95% confidence interval (CI): 0.718–0.929], and the accuracy, sensitivity, specificity, precision, and recall were 0.775, 0.731, 0.796, 0.633, and 0.731, respectively.

Conclusions: An integrated model, utilizing US radiomics and clinical risk features, effectively discriminates between benign and malignant indeterminate thyroid nodules (Bethesda III), thereby minimizing the need for unnecessary diagnostic surgeries and subsequent complications.

Keywords: Indeterminate thyroid nodule; radiomics; machine learning; deep transfer learning; convolutional neural network


Submitted Jul 20, 2024. Accepted for publication Nov 06, 2024. Published online Nov 24, 2024.

doi: 10.21037/gs-24-310


Highlight box

Key findings

• The management of Bethesda III thyroid nodules is difficult, and a non-invasive and more accurate method of identifying the benign and malignant nature of Bethesda III nodules is urgently needed. A comprehensive model incorporating clinical, ultrasound (US) imaging histology, BRAFV600E and US features can be an effective preoperative aid in diagnosing the benign and malignant nature of Bethesda III thyroid nodules.

What is known and what is new?

• Bethesda III is common in fine needle aspiration of thyroid nodules. Its usual management is to perform repeat fine-needle aspiration or molecular testing or diagnostic glandular resection.

• There is evidence that repeat fine-needle aspiration or molecular testing or diagnostic glandular resection is also associated with a number of problems that make the management of Bethesda III nodules difficult. A comprehensive model that incorporates clinical, US imaging histology, BRAFV600E, and US features may be more effective than a single clinical or imaging histology model.

What is the implication, and what should change now?

• It is recommended that the need for repeat fine-needle aspiration or molecular testing or diagnostic glandular resection be reassessed in the management of Bethesda III nodules.

• Physicians should consider the benefit of clinical, US imaging histology, BRAFV600E, and US features in the management of Bethesda III nodules.


Introduction

The prevalence of thyroid nodules is substantial, but only 7% to 15% are malignant (1). Accurate characterization is thus essential for formulating an effective treatment strategy. Aside from surgical intervention, ultrasound-guided fine-needle aspiration biopsy (US-FNAB) is the gold standard for diagnosing the malignancy of thyroid nodules (2). The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) stratifies fine-needle aspiration results into six categories, each correlating with varying malignancy risks (3). Indeterminate results constitute approximately 20% to 30% of cases (4) and carry a malignancy risk ranging from 10% to 40% (5). Within this subset, the frequency of Atypia of Undetermined Significance or Follicular Lesions (AUS/FLUS, Bethesda III) is about 10%. Clinical guidelines advocate for patients exhibiting indeterminate FNA results to undergo additional examinations, which may include a repeat biopsy—a procedure that is not only painful but may also yield analogous indeterminate results, genomic analysis—a potentially costly method that is not universally accessible, or diagnostic thyroid surgery—a highly invasive and expensive procedure associated with numerous life-altering complications (5,6). A 3-month waiting period is required for repeat FNA, which demonstrates a recurrence rate of 10–40% for indeterminate nodules upon retesting (7,8), while patients opting for diagnostic surgery risk receiving benign pathological findings. Prior research efforts have focused on enhancing nodule management by incorporating suspicious ultrasound (US) features, yielding preliminary evidence that supports improved diagnostic accuracy (1,9). However, the efficacy of this approach is operator-dependent and subject to individual variability (10,11).

Radiomics techniques have garnered widespread application in medicine to augment diagnostic accuracy through high-throughput feature extraction from medical images. US-based radiomics integrates US, pathology, genetics, and clinical data for comprehensive artificial intelligence-driven analysis, unveiling nuanced tissue and cellular features unattainable with conventional methods (12,13). Several recent studies indicate that radiomic features derived from US can predict the malignancy in thyroid nodules (14,15). To our knowledge, there are few studies using ultrasonic imaging for radiomics analysis to predict whether thyroid nodules with indeterminate cytological results are malignant (16,17).

The primary aim of this study is to explore the potential of US radiomics, and clinical data to enhance the diagnostic accuracy of indeterminate cytology (Bethesda III) thyroid nodules. We present this article in accordance with the TRIPOD reporting checklist (available at https://gs.amegroups.com/article/view/10.21037/gs-24-310/rc).


Methods

This study was conducted in accordance with the provisions of the Declaration of Helsinki (as revised in 2013). The study received approval from the Ethics Committees of Shanghai Sixth People’s Hospital (No. 2020-031) and Tinglin Hospital (No. 2024-KY-184), and individual informed consent for this retrospective analysis was waived.

This bi-center retrospective study gathered US images and clinical data from thyroid nodules classified as Bethesda III via US-guided fine needle aspiration cytology from two hospitals. The inclusion criteria encompassed: (I) patients who had undergone either total or partial thyroidectomy; and (II) patients who had preoperative US and BRAFV600E mutation assessments of the thyroid gland. Patients were excluded for any of the following reasons: (I) incomplete surgical or pathological reports; (II) surgical intervention for recurrent thyroid cancer; or (III) suboptimal quality of US images, characterized by significant artifacts or low resolution.

This study retrospectively collected data from patients at center 1 (Shanghai’s Sixth People’s Hospital) from June 2017 to June 2022, diagnosed with Bethesda Category III thyroid nodules through US-guided fine-needle aspiration cytology, serving as the internal training and validation sets. For the external test set, we retrospectively collected data from Bethesda III patients at center 2 (Tinglin Hospital) from June 2019 to June 2022. Initially, 428 patients were included in the study, of which 119 were excluded due to incomplete surgical or pathological reports. Subsequently, a radiologist with 10 years of experience in US (L.Z.) conducted a preliminary screening of the US images, excluding 35 nodules from 35 patients. Ultimately, 274 nodules from 274 patients were included in this study. In the internal dataset, 136 thyroid nodules were randomly selected as the training set based on a 7:3 ratio, while the remaining 58 nodules constituted the internal validation set. The external validation set comprised 80 nodules. Details of the screening process for each research cohort are illustrated in Figure 1.

Figure 1 The process of patient enrollment.

Clinical data collection

We utilized a medical records system to document patients’ clinical and pathological data, encompassing gender, age, and surgical pathology findings. US images were retrieved from three medical institutions’ databases, with thyroid US images being obtained from seven different devices. Two experienced radiologists, L.G. (20 years of experience) and L.Z. (10 years of experience), assessed the US features and Chinese Thyroid Imaging Reporting and Data System (C-TIRADS) score of all nodules (9). We conducted a comparative analysis of the clinical and US characteristics of benign and malignant Bethesda III nodules. In cases of diagnostic discrepancies, the two radiologists reached consensus through discussion. Moreover, they were blind to the patients’ clinical histories, pre-surgical US reports, surgical annotations, and pathological outcomes.

BRAFV600E mutation analysis

Interventional radiologists employed a 22-gauge needle for fine-needle aspiration biopsies of the nodules under US guidance, withdrawing the needle post-aspiration. The biopsy needle, attached to a 2 mL syringe, was extensively rinsed in a tube filled with a BRAFV600E mutation detection solution, ensuring an adequate tumor cell sample for real-time polymerase chain reaction (PCR) analysis. Each nodule underwent needle aspiration thrice.

US image pre-processing

Prior to formally delineating the regions of interest (ROIs), we employed 150 images (75 benign, 75 malignant) to evaluate the consistency of the ROIs drawn by two radiologists. These images were sourced from an independent dataset, collected from Center 1, and were not included in the main patient cohort. Subsequently, we utilized the open-source annotation tool ITK-SNAP (Version 3.8.0) for contouring all nodules. Annotation and ROI delineation in the aforementioned images were carried out by a radiologist (L.S.) with 9 years of US experience and subsequently reviewed and revised by another radiologist (L.Z.) with 10 years of experience. Finally, a total of 274 standardized images and nodules were obtained. Of these, 136 images from center 1 were randomly selected for the training set, and the remaining 58 images were used as the internal test set. Additionally, 80 standardized images were selected for the external validation set.

Feature extraction, selection and model construction

The primary analysis workflow of radiomics includes lesion segmentation, feature extraction, feature selection, and model construction, as shown in Figure 2. We employed the Pyradiomics (version 3.1.0) to perform radiomic feature extraction from delineated ROIs in US images (18).

Figure 2 Radiomics analysis and model building. mRMR, max-relevance and min-redundancy; LASSO, least absolute shrinkage and selection operator; LR, logistic regression.

Feature selection was conducted through a four-step process: (I) radiomic parameters were assessed for reproducibility and stability using the intraclass correlation coefficient (ICC), retaining those with an ICC value of ≥0.75 for subsequent analyses; (II) Z-score normalization standardizes the radiomic feature data across training and validation datasets, with features demonstrating a P value <0.05 retained based on univariate analysis, independent t-tests, or Mann-Whitney U tests; (III) the least absolute shrinkage and selection operator (LASSO) algorithm was employed for the selection of statistically significant features; and (IV) construction of significant features is finalized through multivariate logistic regression (LR) analysis.

Initially, machine learning models were constructed using radiomic features and clinical variables independently to evaluate their predictive accuracy for malignancies in Bethesda III thyroid nodules. To augment predictive accuracy, these feature sets are integrated in subsequent analyses.

Performance evaluation

The efficacy of the established models was assessed utilizing receiver operating characteristic (ROC) analysis, and the corresponding area under the curve (AUC) was quantified and statistically compared across cohorts through the application of the DeLong test. Additionally, metrics such as predictive accuracy, sensitivity, and specificity were evaluated. To appraise the clinical utility of the models, decision curve analysis (DCA) was employed to quantify the net benefit at multiple threshold probabilities.

Statistical analysis

Statistical analyses were conducted using R software and SPSS 26.0, with pathological outcomes serving as the gold standard. Normality and homogeneity of variance tests were performed on the measured data. For normally distributed continuous variables, independent two-sample t-tests were employed, whereas the Mann-Whitney U test was utilized for non-normally distributed continuous variables. Categorical variables were compared using the χ2 test. Differences between models were assessed using the DeLong test via MedCalc software (version 20.01). Clinical variable screening was carried out using SPSS software (version 26.0, IBM, USA). Python (version 3.10) was deployed for executing ICC evaluations, Spearman rank correlation tests, Z-score normalization, and LASSO regression analyses. A P value of <0.05 was considered statistically significant.


Results

Baseline information

Table 1 presents the baseline characteristics of patients, as well as the sonographic features of nodule images, across the training, internal validation, and external validation sets. Statistical analysis revealed that age (P=0.54) and color Doppler flow imaging (CDFI) (P=0.44) showed no significant differences between the benign and malignant groups, while gender (P<0.001), BRAFV600E (P=0.02), diameter (P=0.02), and capsule contact (P<0.001) showed significant differences between the benign and malignant groups.

Table 1

Clinical information for the training, internal validation and external test set

Feature name Training set Internal validation set External test set
Total
(n=136)
Benign
(n=78)
Malignancy (n=58) Total
(n=58)
Benign
(n=36)
Malignancy (n=22) Total
(n=80)
Benign
(n=54)
Malignancy (n=26)
Age (years) 47.02±13.50 48.96±14.52 44.41±11.60 49.28±13.66 49.31±14.09 49.23±13.24 45.29±12.58 45.07±11.92 45.73±14.07
Diameter (mm) 11.96±6.85 13.00±7.19 10.55±6.14 12.68±8.57 14.06±9.51 10.43±6.32 11.82±6.51 12.11±6.81 11.23±5.93
Sex
   Female 68 (50.00) 53 (67.95) 15 (25.86) 37 (63.79) 22 (61.11) 15 (68.18) 36 (45.00) 24 (44.44) 12 (46.15)
   Male 68 (50.00) 25 (32.05) 43 (74.14) 21 (36.21) 14 (38.89) 7 (31.82) 44 (55.00) 30 (55.56) 14 (53.85)
BRAFV600E
   Negative 78 (57.35) 52 (66.67) 26 (44.83) 39 (67.24) 24 (66.67) 15 (68.18) 58 (72.50) 46 (85.19) 12 (46.15)
   Positive 58 (42.65) 26 (33.33) 32 (55.17) 19 (32.76) 12 (33.33) 7 (31.82) 22 (27.50) 8 (14.81) 14 (53.85)
CDFI
   None 72 (52.94) 44 (56.41) 28 (48.28) 45 (77.59) 26 (72.22) 19 (86.36) 53 (66.25) 36 (66.67) 17 (65.38)
   Yes 64 (47.06) 34 (43.59) 30 (51.72) 13 (22.41) 10 (27.78) 3 (13.64) 27 (33.75) 18 (33.33) 9 (34.62)
C-TIRADS score
   1 44 (32.35) 33 (42.31) 11 (18.97) 21 (36.21) 13 (36.11) 8 (36.36) 24 (30.00) 18 (33.33) 6 (23.08)
   2 39 (28.68) 23 (29.49) 16 (27.59) 19 (32.76) 13 (36.11) 6 (27.27) 20 (25.00) 11 (20.37) 9 (34.62)
   3 38 (27.94) 17 (21.79) 21 (36.21) 13 (22.41) 7 (19.44) 6 (27.27) 27 (33.75) 19 (35.19) 8 (30.77)
   4 15 (11.03) 5 (6.41) 10 (17.24) 5 (8.62) 3 (8.33) 2 (9.09) 9 (11.25) 6 (11.11) 3 (11.54)
Capsule contact
   None 64 (47.06) 53 (67.95) 11 (18.97) 32 (55.17) 24 (66.67) 8 (36.36) 58 (72.50) 46 (85.19) 12 (46.15)
   Yes 72 (52.94) 25 (32.05) 47 (81.03) 26 (44.83) 12 (33.33) 14 (63.64) 22 (27.50) 8 (14.81) 14 (53.85)

Data are presented as number (percentage) or mean ± standard deviation. CDFI, color Doppler flow imaging; C-TIRADS, Chinese Thyroid Imaging Reporting and Data System.

Feature selection and radiomics model construction

From each image, 1,561 features were initially extracted and subsequently reduced to 230 following preprocessing. These refined features were then subjected to further selection via both LASSO and LR algorithms. Ultimately, a subset of 11 features was chosen to construct the radiomics model. Figure 3A,3B delineate the feature selection process executed through the LASSO algorithm, while Figure 3C displays the corresponding correlation coefficients for each feature. Comprehensive methodologies for feature preprocessing, as well as the LASSO and LR algorithms, alongside the formula for calculating the radiomics score, are elaborated in Appendix 1.

Figure 3 The feature selection process. The parameters are screened using “lambda” to minimize the mean error. The least absolute shrinkage and selection operator logistic regression model was used to select radiomics features. (A,B) The feature selection process; (C) the corresponding correlation coefficients for each feature. MSE, mean squared error.

Establishment of the clinical model and clinical-radiomics model

Table 1 presents the outcomes of both univariate and multivariate regression analyses conducted on the training set. Within the univariate analysis, the age of the patients and the CDFI of the nodule did not statistically differentiate between benign and malignant Bethesda III thyroid nodules (P>0.05). Consequently, other factors were incorporated into the multivariate regression analysis. The clinical model was formulated using capsule contact (capsule contact refers to the physical proximity or attachment of the thyroid nodule to the thyroid capsule), C-TIRADS and BRAFV600E mutations. Meanwhile, the clinical-radiomics model was built upon the capsule contact, BRAFV600E mutations, and the radiomics score.

The training set, internal validation set, and external validation set of the radiomics model (Rad-signature), clinical model (clinic-signature), and clinical-radiomics model (nomogram) model were evaluated using AUC, accuracy, sensitivity, and specificity (Table 2).

Table 2

Diagnostic performance of model

Group Model name AUC (95% CI) Accuracy Sensitivity Specificity PPV NPV Precision Recall
Training Clinical 0.745 (0.6721–0.8178) 0.735 0.81 1 0.653 0.828 0.653 0.81
Radiomics 0.770 (0.6859–0.8539) 0.765 0.621 0.872 0.783 0.756 0.783 0.621
Nomogram 0.843 (0.7795–0.9061) 0.772 0.931 0.654 0.667 0.927 0.667 0.931
Internal validation Clinical 0.670 (0.5178–0.8219) 0.707 0.591 0.824 0.619 0.757 0.619 0.591
Radiomics 0.788 (0.6629–0.9128) 0.759 0.727 0.778 0.667 0.824 0.667 0.727
Nomogram 0.823 (0.7085–0.9379) 0.81 0.909 0.75 0.69 0.931 0.69 0.909
External test Clinical 0.695 (0.5864–0.8039) 0.75 0.538 1 0.636 0.793 0.636 0.538
Radiomics 0.754 (0.6211–0.8860) 0.787 0.731 0.815 0.655 0.863 0.655 0.731
Nomogram 0.824 (0.7184–0.9298) 0.775 0.731 0.796 0.633 0.86 0.633 0.731
Yoon et al. (16) 0.839
Keutgen et al. (17) 0.880

AUC, area under the curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value.

In comparing the ability to differentiate benign from malignant Bethesda III thyroid nodules, the radiomics model exhibited superior AUC values (0.770, 0.788, and 0.754 for the training, internal validation, and external validation sets, respectively) compared to the clinical models (0.745, 0.670, and 0.695 for the corresponding sets). The clinical-radiomics model achieved the highest AUC and accuracy across all three sets (0.843, 0.823, and 0.824, respectively). Statistically, the clinical-radiomics model outperformed both the clinical model (P<0.001, P=0.03, and P<0.001 for the three sets) and the radiomics model (P=0.03, P=0.34, and P=0.22 for the three sets). However, no significant difference was observed between the radiomics model and the clinical model in the internal validation set (P=0.51). Employing the clinical-radiomics model as a foundation, we formulated a nomogram, illustrated in Figure 4A. Figure 4B-4D displays the ROC and DCA curves for all three models, indicating that both the radiomics and clinical-radiomics nomograms outperformed the clinical nomogram in terms of overall net benefit across the majority of reasonable threshold probabilities.

Figure 4 Comparison of three different diagnostic models. A clinical-radiomics nomogram (A) was developed, combined ultrasound radiomics, and clinical features. The receiver operating characteristics of the clinical-radiomics nomogram, radiomics model, and clinical model in the differential diagnosis of indeterminate cytology (Bethesda III) thyroid nodules were compared between internal validation set (B) and external test set (C). Radiomics nomogram calibration curves for external test set (D). CI, confidence interval; Rad, Radiomics; CI, confidence interval; Sig, Significance; DCA, decision curve analysis; AUC, area under the curve.

Discussion

Thyroid nodules with indeterminate cytology (Bethesda III) pose certain challenges for thyroid nodule management (19). This implies an urgent need for more accurate diagnostic methods in the preoperative management of thyroid nodules with indeterminate cytology (Bethesda III).

Our predictive model based on conventional US features and BRAFV600E shows that combining capsule contact, C-TIRADS, and BRAFV600E mutations can assist in distinguishing the benignity and malignancy of thyroid nodules with indeterminate significance. However, both AUC and accuracy are not satisfactory. There have been prior studies on thyroid nodules with indeterminate cytology, and most of them were conducted using standard ultrasonography. Some researchers attempted to use risk stratification systems based on conventional ultrasonography, such as American College of Radiology Thyroid Imaging Reporting and Data System (ACR-TIRADS), C-TIRADS, and Korean Thyroid Imaging Reporting and Data System (TI-RADS), to differentiate between benign and malignant thyroid nodules of indeterminate significance (20-25). Similar to our research, previous study results also indicate that these classification methods have low specificity and there is significant variability among observers. Therefore, a more stable and superior prediction model is needed.

There have been many previous radiomic studies using US to distinguish between benign and malignant thyroid nodules, their invasiveness, and genetic mutations, and these studies have clearly shown that radiomic features extracted from grayscale thyroid US images can assist in predicting the biological attributes of identified thyroid nodules (12,14,26-30). However, their studies did not highlight the performance of their model in thyroid nodules with indeterminate cytology. Our study reveals that quantitative radiomics features are instrumental in predicting malignancy in thyroid nodules presenting with Bethesda III indeterminate cytology. The incorporation of a radiomics score into the prediction model enhances the prognostic ability for malignancy within nodules characterized by indeterminate cytology, the model’s AUC significantly increased to 0.824 compared to 0.695 when only clinical risk factors were used in the external validation sets, and the specificity reached 0.796. In the investigation executed by Yoon and colleagues (16), a model was employed, devised utilizing a curated set of 15 radiomics features in conjunction with clinical risk factors, to differentiate between benign and malignant nodules under Bethesda III and IV that were indeterminate in nature. The model procured an AUC of 0.839, demonstrating a substantial elevation compared to the AUC of 0.583, observed when solely clinical risk factors were implemented (16). But their study has a limitation: the absence of external data for model validation. Our model, on the other hand, achieved AUC scores of 0.823 and 0.824 in the internal and external validation sets respectively, demonstrating a stable performance of our model. Keutgen and colleagues utilized a machine learning model based on ultrasonographic radiomics to differentiate the benignancy and malignancy of Bethesda III, IV, and V, achieving an AUC of 0.88 in the internal training set and 0.68 in the external validation set (17). In the study by Gild et al., they developed a deep learning model based on US images of 88 Bethesda III thyroid nodules, with an AUC of 0.74 for their diagnostic model (31). Its diagnostic performance is not superior to our Clinical-Radiomics model (31), possibly due to the smaller number of cases included and deep learning often leads to overfitting due to insufficient training data (32). The radiomics-based nomogram offers a non-invasive, widely available, and cost-effective method for assessing benignancy in thyroid nodules with indeterminate pathology, as it utilizes US imaging, which is accessible in most clinical settings. Unlike histologic biopsy, which is invasive, or molecular biomarker testing and fluorodeoxyglucose positron emission tomography/computed tomography (FDG-PET/CT), which are expensive and less accessible, radiomics provides a practical alternative. However, the method’s accuracy may be influenced by operator skill and image quality, and its specificity might not match that of molecular testing. While the nomogram shows strong predictive performance, in cases requiring higher diagnostic precision, combining it with molecular testing or histologic biopsy may be necessary to ensure more accurate results.

There are several limitations in this study. Firstly, despite being a multicenter study, the number of images used for training and testing was insufficient. In future work, we will collect more data to validate the model’s performance. Secondly, this study is retrospective, potentially leading to selection bias; larger-scale prospective trials are needed to validate the model’s performance. Thirdly, this study was conducted on two-dimensional US images; some radiomic features might be affected by two-dimensional and three-dimensional segmentation. Fourthly, the radiomics features might be influenced by the type of US machine used, which, in turn, may impact our results. Fifthly, we used JPG format US images in our analysis, which might have caused some data loss. Lastly, we included cases only from one city in China; the incidence of each subtype of thyroid cancer varies by country or region. Even if the same research protocol is applied to populations in other countries or regions, the results might differ.


Conclusions

Despite the constrained sample size, our preliminary findings also suggest that, our proposed clinical-radiomic integrated model has relatively good diagnostic performance in distinguishing malignant from benign indeterminate Bethesda III thyroid nodules. It can provide beneficial guidance when choosing treatment plans for patients with Bethesda III category thyroid nodules, reducing unnecessary diagnostic surgeries.


Acknowledgments

Funding: The study was funded by the Science and Technology Development of Pudong New Area, Shanghai, China (No. 2023-Y52).


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://gs.amegroups.com/article/view/10.21037/gs-24-310/rc

Data Sharing Statement: Available at https://gs.amegroups.com/article/view/10.21037/gs-24-310/dss

Peer Review File: Available at https://gs.amegroups.com/article/view/10.21037/gs-24-310/prf

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://gs.amegroups.com/article/view/10.21037/gs-24-310/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013) and was approved by the Ethics Committees of Shanghai Sixth People’s Hospital (No. 2020-031) and Tinglin Hospital (No. 2024-KY-184), and individual informed consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Haugen BR, Alexander EK, Bible KC, et al. 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer: The American Thyroid Association Guidelines Task Force on Thyroid Nodules and Differentiated Thyroid Cancer. Thyroid 2016;26:1-133. [Crossref] [PubMed]
  2. American Thyroid Association (ATA) Guidelines Taskforce on Thyroid Nodules and Differentiated Thyroid Cancer. Revised American Thyroid Association management guidelines for patients with thyroid nodules and differentiated thyroid cancer. Thyroid 2009;19:1167-214. [Crossref] [PubMed]
  3. Crippa S, Mazzucchelli L, Cibas ES, et al. The Bethesda System for reporting thyroid fine-needle aspiration specimens. Am J Clin Pathol 2010;134:343-4; author reply 345. [Crossref] [PubMed]
  4. Durante C, Grani G, Lamartina L, et al. The Diagnosis and Management of Thyroid Nodules: A Review. JAMA 2018;319:914-24. [Crossref] [PubMed]
  5. Cibas ES, Ali SZ. The 2017 Bethesda System for Reporting Thyroid Cytopathology. Thyroid 2017;27:1341-6. [Crossref] [PubMed]
  6. Onken AM, VanderLaan PA, Hennessey JV, et al. Combined molecular and histologic end points inform cancer risk estimates for thyroid nodules classified as atypia of undetermined significance. Cancer Cytopathol 2021;129:947-55. [Crossref] [PubMed]
  7. Qiu Y, Xing Z, Liu J, et al. Diagnostic reliability of elastography in thyroid nodules reported as indeterminate at prior fine-needle aspiration cytology (FNAC): a systematic review and Bayesian meta-analysis. Eur Radiol 2020;30:6624-34. [Crossref] [PubMed]
  8. Yoo WS, Ahn HY, Ahn HS, et al. Malignancy rate of Bethesda category III thyroid nodules according to ultrasound risk stratification system and cytological subtype. Medicine (Baltimore) 2020;99:e18780. [Crossref] [PubMed]
  9. Zhou J, Yin L, Wei X, et al. 2020 Chinese guidelines for ultrasound malignancy risk stratification of thyroid nodules: the C-TIRADS. Endocrine 2020;70:256-79. [Crossref] [PubMed]
  10. Park SJ, Park SH, Choi YJ, et al. Interobserver variability and diagnostic performance in US assessment of thyroid nodule according to size. Ultraschall Med 2012;33:E186-90. [Crossref] [PubMed]
  11. Hu Y, Xu S, Zhan W. Diagnostic performance of C-TIRADS in malignancy risk stratification of thyroid nodules: A systematic review and meta-analysis. Front Endocrinol (Lausanne) 2022;13:938961. [Crossref] [PubMed]
  12. Kwon MR, Shin JH, Park H, et al. Radiomics Study of Thyroid Ultrasound for Predicting BRAF Mutation in Papillary Thyroid Carcinoma: Preliminary Results. AJNR Am J Neuroradiol 2020;41:700-5. [Crossref] [PubMed]
  13. Liu T, Ge X, Yu J, et al. Comparison of the application of B-mode and strain elastography ultrasound in the estimation of lymph node metastasis of papillary thyroid carcinoma based on a radiomics approach. Int J Comput Assist Radiol Surg 2018;13:1617-27. [Crossref] [PubMed]
  14. Gao X, Ran X, Ding W. The progress of radiomics in thyroid nodules. Front Oncol 2023;13:1109319. [Crossref] [PubMed]
  15. Toro-Tobon D, Loor-Torres R, Duran M, et al. Artificial Intelligence in Thyroidology: A Narrative Review of the Current Applications, Associated Challenges, and Future Directions. Thyroid 2023;33:903-17. [Crossref] [PubMed]
  16. Yoon J, Lee E, Kang SW, et al. Implications of US radiomics signature for predicting malignancy in thyroid nodules with indeterminate cytology. Eur Radiol 2021;31:5059-67. [Crossref] [PubMed]
  17. Keutgen XM, Li H, Memeh K, et al. A machine-learning algorithm for distinguishing malignant from benign indeterminate thyroid nodules using ultrasound radiomic features. J Med Imaging (Bellingham) 2022;9:034501. [Crossref] [PubMed]
  18. van Griethuysen JJM, Fedorov A, Parmar C, et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res 2017;77:e104-7. [Crossref] [PubMed]
  19. Alqahtani SM. Current controversies in the management of patients with indeterminate thyroid nodules. Saudi Med J 2023;44:633-9. [Crossref] [PubMed]
  20. Dickey MV, Nguyen A, Wiseman SM. Cancer risk estimation using American College of Radiology Thyroid Imaging Reporting and Data System for cytologically indeterminate thyroid nodules. Am J Surg 2022;224:653-6. [Crossref] [PubMed]
  21. Singaporewalla RM, Hwee J, Lang TU, et al. Clinico-pathological Correlation of Thyroid Nodule Ultrasound and Cytology Using the TIRADS and Bethesda Classifications. World J Surg 2017;41:1807-11. [Crossref] [PubMed]
  22. Tan H, Li Z, Li N, et al. Thyroid imaging reporting and data system combined with Bethesda classification in qualitative thyroid nodule diagnosis. Medicine (Baltimore) 2019;98:e18320. [Crossref] [PubMed]
  23. Maia FF, Matos PS, Pavin EJ, et al. Thyroid imaging reporting and data system score combined with Bethesda system for malignancy risk stratification in thyroid nodules with indeterminate results on cytology. Clin Endocrinol (Oxf) 2015;82:439-44. [Crossref] [PubMed]
  24. Chen L, Chen M, Li Q, et al. Machine Learning-Assisted Diagnostic System for Indeterminate Thyroid Nodules. Ultrasound Med Biol 2022;48:1547-54. [Crossref] [PubMed]
  25. Xing Z, Qiu Y, Zhu J, et al. Diagnostic performance of ultrasound risk stratification systems on thyroid nodules cytologically classified as indeterminate: a systematic review and meta-analysis. Ultrasonography 2023;42:518-31. [Crossref] [PubMed]
  26. Zhao CK, Ren TT, Yin YF, et al. A Comparative Analysis of Two Machine Learning-Based Diagnostic Patterns with Thyroid Imaging Reporting and Data System for Thyroid Nodules: Diagnostic Performance and Unnecessary Biopsy Rate. Thyroid 2021;31:470-81. [Crossref] [PubMed]
  27. Li F, Pan D, He Y, et al. Using ultrasound features and radiomics analysis to predict lymph node metastasis in patients with thyroid cancer. BMC Surg 2020;20:315. [Crossref] [PubMed]
  28. Zhang C, Cheng L, Zhu W, et al. Construction of a Diagnostic Model for Lymph Node Metastasis of the Papillary Thyroid Carcinoma Using Preoperative Ultrasound Features and Imaging Omics. J Healthc Eng 2022;2022:1872412. [Crossref] [PubMed]
  29. Zhu H, Yu B, Li Y, et al. Models of ultrasonic radiomics and clinical characters for lymph node metastasis assessment in thyroid cancer: a retrospective study. PeerJ 2023;11:e14546. [Crossref] [PubMed]
  30. Zhang Z, Zhang X, Yin Y, et al. Integrating BRAF(V600E) mutation, ultrasonic and clinicopathologic characteristics for predicting the risk of cervical central lymph node metastasis in papillary thyroid carcinoma. BMC Cancer 2022;22:461. [Crossref] [PubMed]
  31. Gild ML, Chan M, Gajera J, et al. Risk stratification of indeterminate thyroid nodules using ultrasound and machine learning algorithms. Clin Endocrinol (Oxf) 2022;96:646-52. [Crossref] [PubMed]
  32. Peng H, Dong D, Fang MJ, et al. Prognostic Value of Deep Learning PET/CT-Based Radiomics: Potential Role for Future Individual Induction Chemotherapy in Advanced Nasopharyngeal Carcinoma. Clin Cancer Res 2019;25:4271-9. [Crossref] [PubMed]
Cite this article as: Zhong L, Shi L, Lai J, Hu Y, Gu L. Combined model integrating clinical, radiomics, BRAFV600E and ultrasound for differentiating between benign and malignant indeterminate cytology (Bethesda III) thyroid nodules: a bi-center retrospective study. Gland Surg 2024;13(11):1954-1964. doi: 10.21037/gs-24-310

Download Citation