Development and validation of an Automated computed tomography segmentation model for thyroid nodules using a channel attention high-resolution network
Highlight box
Key findings
• The proposed Channel Attention High-Resolution Network (CA-HRNet) achieved the best performance in thyroid nodule segmentation on computed tomography (CT) images, based on a single-center retrospective dataset of 500 patients with pathologically confirmed thyroid nodules. It attained an overall Dice coefficient of 78.6%, while reducing computational complexity by 73% compared with the Channel Feature Selection Module-only configuration.
What is known and what is new?
• Reliable CT nodule segmentation is essential for radiomics and computer-aided diagnosis, but manual region of interest delineation is subjective and time-consuming, and existing high-resolution segmentation models may be computationally demanding.
• This study introduced a channel attention mechanism and dynamic feature selection module for automated CT segmentation. The model improved boundary delineation, particularly in difficult malignant nodules, while reducing redundant computation.
What is the implication, and what should change now?
• CA-HRNet may provide reproducible CT masks for downstream quantitative analysis and radiomics workflows. Future work should focus on multicenter validation, prospective testing, and evaluation of whether automated masks improve clinically interpretable benign-malignant differentiation or outcome prediction.
Introduction
Thyroid nodules are common findings in clinical practice, and the incidence of thyroid cancer has increased substantially in many regions, partly because of the wider use of neck imaging and surveillance (1,2). Contemporary guideline frameworks, including the 2025 American Thyroid Association (ATA) guidance for differentiated thyroid cancer and the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS), emphasize risk-adapted evaluation, avoidance of unnecessary procedures, and consistent imaging-based communication (3,4). In this context, accurate and reproducible image analysis is important because overestimation or underestimation of lesion extent can affect biopsy decisions, surgical planning, radiomics feature extraction, and follow-up assessment (5).
Currently, thyroid and cervical lymph node ultrasound serves as the primary imaging method for evaluating thyroid nodules, often guiding the decision to perform fine-needle aspiration biopsy (FNAB) (6,7). However, ultrasound diagnosis is heavily reliant on operator experience, with high subjectivity and significant inter-observer variability, which cannot meet the requirements of standardized quantitative analysis (8). Data show that while the positive predictive value of ultrasound screening can reach 20–80%, its diagnostic rate for thyroid cancer is only 5–15% (9), potentially leading to unnecessary biopsies, increased patient discomfort and healthcare costs. As a complementary modality, computed tomography (CT) provides superior spatial resolution and clearer depiction of anatomical relationships, making it a valuable adjunct tool, especially for complex cases such as those with retrosternal extension, large multinodular goiters, or suspected invasion of surrounding structures (4). Therefore, CT-based quantitative tools may be useful when CT has already been obtained for clinical indications, particularly if they can generate objective and reproducible lesion masks. Nonetheless, the interpretation of both ultrasound and CT images still relies heavily on qualitative assessment by radiologists, which lacks standardized, objective, and quantifiable biomarkers for precise nodule characterization (10).
To address this clinical imperative, radiomics has emerged as a pivotal methodological framework in quantitative imaging analysis (11-13). It converts medical images into high-dimensional features that may capture lesion heterogeneity and phenotypic characteristics beyond visual assessment (14). In this context, radiomics research based on both ultrasound and CT imaging has emerged and achieved notable progress (15). Zhao et al. (16) developed a multimodal ultrasound radiomics nomogram for differentiating follicular thyroid adenoma from carcinoma, and Lin et al. (17) reported CT-based radiomics models for benign-malignant thyroid nodule differentiation in a multicenter setting. Du et al. (18) further integrated clinical features and CT radiomics to predict lateral cervical lymph node metastasis, a clinically important determinant of prognosis and surgical strategy (19). These studies highlight the potential value of quantitative thyroid imaging, while also underscoring the dependence of radiomics on reliable lesion segmentation.
Image segmentation is essential for a wide range of clinical applications beyond radiomics, including tumor detection, disease monitoring, and surgical planning (20). However, the lack of efficient automated segmentation hinders the clinical translation of CT-based thyroid radiomics, as manual region of interest (ROI) delineation is time-consuming, subjective, and poorly reproducible (21-23). This bottleneck underscores the need for efficient and accurate automated segmentation techniques. Medical image segmentation has undergone a profound transformation from traditional image processing techniques to data-driven deep learning architectures (24-29).
Convolutional Neural Networks (CNNs), particularly U-Net and its numerous variants, have become the mainstream paradigm for medical image segmentation. Peng et al. (30) developed DC-Contrast U-Net specifically for pediatric thyroid ultrasound images, achieving a mean intersection over union (mIoU) of 0.866 with an extremely low parameter count. Xiang et al. (31) proposed a federated learning-based multi-attention guided UNet (MAUNet), attaining high generalization with Dice coefficients (DCs) of 0.887–0.912 across private multi-center datasets. On the public TN3K dataset, Dong et al. (32) embedded a dual-path attention mechanism into UNet++, obtaining an IoU of 0.745 and a Recall of 0.870; Wang et al. (33) introduced a multi-path vision Mamba module, raising the Dice to 79.28%; Ye et al. (34) leveraged background-aware and multi-scale feature aggregation modules, pushing the Dice to 0.8616; and Sun et al. (35) proposed RTS-Net, which integrated cascaded graph convolution and dual-path attention, achieving a leading IoU of 71.87% with a lightweight architecture.
Compared with ultrasound, automated thyroid nodule segmentation in CT faces different challenges, including variable enhancement, complex surrounding anatomy, partial-volume effects, and artifacts related to contrast injection or motion. Traditional machine learning studies have used handcrafted CT texture features and support vector machine (SVM) classifiers to distinguish nodules from normal tissue, with reported accuracy values of 0.880 and 0.8673 in prior studies (36,37). With the development of deep learning, Zhao et al. (38) designed a fully automatic detection algorithm using an improved Dense-U-Net for ROI segmentation and a multidimensional input-fusion CNN for classification, while Li et al. (39) proposed an EfficientNet-based U-Net for automatic recognition and classification of thyroid nodules in CT images. However, available CT studies remain limited by smaller datasets, feature-engineering dependence, or insufficient optimization for computational efficiency.
Thus, there remains a need for an automated CT segmentation model that is robust to the complex morphology of malignant nodules, developed on a pathologically confirmed cohort, and efficient enough for future clinical translation. The present study focused on segmentation rather than direct benign-malignant classification. Its clinical role is to generate reliable CT masks that can support downstream radiomics, computer-aided diagnosis, and quantitative follow-up, thereby bridging pixel-level computer vision performance with clinically interpretable workflows.
The aims of this study were:
- To construct a large-scale, pathologically confirmed CT dataset of 500 patients with thyroid nodules and describe its clinical characteristics.
- To develop and internally validate Channel Attention High-Resolution Network (CA-HRNet), a high-resolution segmentation network that incorporates channel attention convolution to improve feature representation.
- To propose and evaluate a Channel Feature Selection Module (CFSM) for multi-scale feature fusion, and to compare segmentation accuracy and computational complexity with mainstream models.
We present this article in accordance with the TRIPOD reporting checklist (available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0162/rc).
Methods
Dataset description
This single-center retrospective model-development and internal-validation study was approved by the Ethics Committee of Quzhou People’s Hospital (approval No. 2025-101) and informed consent was taken from all individual participants. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. Data were obtained from patients with thyroid nodules who underwent CT imaging at Quzhou People’s Hospital between 2017 and 2024. The final dataset included 500 patients with pathologically confirmed thyroid nodules, comprising 250 benign and 250 malignant cases. The dataset was divided at the patient level into training, validation, and internal test sets, including 8,721, 1,121, and 2,517 images from 350, 50, and 100 patients, respectively; no patient contributed images to more than one set.
All included patients met the following inclusion criteria: (I) underwent non-contrast and contrast-enhanced thyroid CT scans; (II) underwent thyroid surgery with pathologically confirmed diagnosis; (III) presented with a single thyroid nodule with the longest diameter ≥5 mm. Exclusion criteria included: (I) time gap between postoperative pathological report and imaging examination exceeding 30 days; (II) severe artifacts or damage in CT images; (III) prior surgery, radiotherapy, chemotherapy, or FNAB before CT examination; (IV) history of malignant tumors.
Histopathological diagnosis from surgical specimens served as the reference standard for benign or malignant status. Baseline clinical variables, including age, sex, nodule diameter, and nodule location, were extracted from medical records and used for cohort description and subgroup interpretation; these variables were not incorporated as predictors in the segmentation network.
Imaging was performed using a Toshiba Aquilion ONE TSX-301C, 320-detector-row wide-body high-speed CT scanner. Scanning parameters were as follows: tube voltage 100 kV, tube current automatically controlled by CareDose4D, slice thickness 0.5 mm, slice spacing 0.5 mm, rotation time 0.275 s, reconstruction matrix 512×512, using the FC04 soft tissue algorithm. All CT images were standardized by professional physicians, with appropriate window width and window level adjustments for clear visualization of thyroid nodules.
Manual thyroid nodule contours were generated by two physicians from the same institution on the CT images. The annotation process was performed using the CT images, and pathological labels were not used as input during segmentation model training. In the internal test set, the manual annotations served as the ground truth for evaluating segmentation performance.
No formal sample size calculation was performed because this was an exploratory retrospective segmentation model-development study. Instead, all eligible patients available during the study period who met the inclusion and exclusion criteria were included. The balanced benign and malignant composition was used to permit subgroup assessment of segmentation difficulty, rather than to train a diagnostic classifier.
After application of the inclusion and exclusion criteria, the final analysis cohort comprised 500 eligible patients. These patients were allocated at the patient level to the training, validation, and internal test sets as described above.
Model architecture
We proposed a Channel Attention High Resolution Net (CA-HRNet) based on High-Resolution Network (HRNet) (40) to perform thyroid nodule segmentation. Traditional segmentation networks follow a “downsample-then-upsample” pipeline, which inevitably loses important spatial details. HRNet, by contrast, starts with a high-resolution convolution stream and gradually adds parallel low-resolution streams. In addition, HRNet introduces repeated multi-scale fusion between parallel streams of different resolutions. At each stage, features from high-to-low and low-to-high resolution streams are exchanged and fused. This design enables the network to preserve spatial detail while incorporating semantic context.
This bidirectional fusion enables the network to capture both high-resolution spatial details and low-resolution semantic information simultaneously. Compared with HRNet, our proposed model CA-HRNet replaced convolution module with Channel Attention Convolution Module (CACM) in stages 2, 3 and 4 (Figure 1). CACM calculated the features as follows. Let be the input feature map. First, the input feature map is fed into a convolution module which is composed of a 3×3 convolutional layer, a batch normalization (BN) layer and ReLU activation function which reduces the number of channels from to and outputs the feature map , where is the reduction factor. Next, is passed through the channel attention module: the spatial dimensions are flattened to obtain the query, key and value matrices , , . Then the channel correlation matrix is computed by matrix multiplication. . Next, the correlation matrix is inverted by subtracting each element from the maximum value of its row to generate the new correlation matrix which ensures and sets the position of the original maximum value to 0. In addition, the larger values in (corresponding to weaker correlations in the original channel correlation) yield higher attention weights. This design aims to explore potential complementary information between channels and avoid feature redundancy caused by over-reliance on strong correlations. Then we calculate the output by , where is a learnable parameter. Subsequently, is input to a convolution module which consists of a 3×3 convolutional and a BN layer to restore the number of channels from to and obtain . Finally, we applied a residual path and ReLU to produce the output . The diagram of this module is given in Figure 2.
The number of branches progressively increases from 1 to 4 across the stages, and every branch is composed of 4 basic blocks. CA-HRNet produces 4 outputs (, , , ) of different scales. We proposed a CFSM to generate final segmentation results. The calculation is as follows. First, , , are up-sampled to the size of . Then, , , and are concatenated along the channel dimension. The concatenated feature is employed to calculate channel attention using the squeeze-and-excitation (SE) block (41). SE block is designed to enhance the representational capacity of CNNs, which explicitly models the interdependencies between feature channels and amplifies informative features while suppressing less useful ones, without drastically increasing computational cost. After the channel attention is calculated, it is sorted from large to small and the top attention is selected. The corresponding attention maps are reserved to produce the final segmentation results. The calculation is shown as follows.
where ; GAP refers to the global average pooling which computes the average value of the entire spatial dimension of each feature channel. is the sigmoid function. MLP is the multi-layer perceptron which contains two linear layers. The calculation flow of MLP is as follows.
where , , ; is the reduction factor; is the rectified linear unit. collects the indices of the top k values. Finally, the selected channel feature is fed into convolution (Conv), BN (42), and ReLU to produce the final segmentation result. The diagram of CFSM is shown in Figure 3.
The model was trained with Dice loss, a widely used metric-based loss function for evaluating and optimizing segmentation models. It was derived from the DC which computes the overlap between the predicted segmentation mask () and the ground truth mask ().
where and are the predicted and ground truth segmentation masks, respectively. The following data augmentation techniques were used to mitigate overfitting: random rotation between ; random horizontal translation between [−0.1 W, 0.1 W]; random vertical translation between [−0.1 H, 0.1 H]; random zoom with factor between [0.8, 1.2]; random vertical and horizontal flips. Here is the image size.
During inference stage, we employed test-time augmentation (TTA) to improve the segmentation performance. TTA is a post-training inference technique widely used in computer vision tasks to improve model robustness and prediction accuracy without retraining the model. It involves applying a series of data augmentations to the test image, generating multiple augmented versions of the same image, and aggregating the model’s predictions on these versions to produce a final result. The calculation is as follows.
where refers to the data augmentation technique; is the reversed operation of . refers to the pixel-wise voting operation. When any two of and are 1, . In particular, we used the following data augmentation techniques: vertical flip, horizontal flip, rotation with 90°, rotation with −90°. The diagram of TTA is shown in Figure 4.
To comprehensively evaluate the performance of the proposed CA-HRNet model, we compared it with mainstream segmentation models such as U-Net (43), SegFormer (44), Transformer U-Net (TransUNet) (45), DAC-Net (46) and HRNet on the internal test set. The DC and IoU were utilized to evaluate the segmentation performance. Because this study developed a segmentation model rather than a benign-malignant diagnostic classifier, discrimination and calibration indices such as area under the curve, sensitivity, specificity, and calibration plots were not calculated.
Statistical analysis
All statistical analyses were performed using R statistical software and SPSS. Continuous variables are presented as mean ± standard deviation (SD) and analyzed using independent t-tests or Mann-Whitney Wilcoxon tests, depending on the distribution of the variables. Categorical variables are described as proportions and compared between groups using the Chi-square test or Fisher’s exact test. Statistical significance was defined as P<0.05.
Results
The model was implemented in PyTorch and trained on an NVIDIA RTX A5000 GPU. We employed the RAdam optimizer (47) with a learning rate of 10−4, a batch size of 8, for 100 epochs. The baseline characteristics of the enrolled cohort were summarized in Table 1. The median age was 48 years, 373 patients (74.6%) were female, and the median nodule diameter was 1.60 cm. Malignant nodules were smaller than benign nodules (median diameter, 0.90 versus 2.40 cm), which is clinically relevant because smaller lesions are more susceptible to partial-volume effects and boundary ambiguity.
Table 1
| Characteristics | All (n=500) | Benign (n=250) | Malignant (n=250) | P value |
|---|---|---|---|---|
| Age (years) | 48.00 [38.00–54.00] | 51.00 [44.00–56.00] | 43.00 [34.00–50.00] | <0.001 |
| Gender | 0.22 | |||
| Male | 127 (25.40) | 57 (22.80) | 70 (28.00) | |
| Female | 373 (74.60) | 193 (77.20) | 180 (72.00) | |
| Position | 0.66 | |||
| Right | 286 (57.20) | 148 (59.20) | 138 (55.20) | |
| Left | 203 (40.60) | 97 (38.80) | 106 (42.40) | |
| Isthmus | 11 (2.20) | 5 (2.00) | 6 (2.40) | |
| Diameter (cm) | 1.60 [0.80–2.60] | 2.40 [1.70–3.10] | 0.90 [0.60–1.48] | <0.001 |
Baseline characteristics of the enrolled patients data are presented as median [range] or n (%). P values are for comparisons between the benign and malignant nodule groups.
Quantitative comparison with mainstream models
As shown in Table 2, CA-HRNet + TTA achieved the best segmentation performance, with an overall DC of 78.6% and an IoU of 70.0% on the internal test set. TTA improved the DC for all tested models by 0.8%, 0.8%, 2.9%, 1.3%, 1.1%, and 1.5% for U-Net, SegFormer, TransUNet, DAC-Net, HRNet, and CA-HRNet, respectively. CA-HRNet + TTA outperformed HRNet + TTA, indicating the added value of the CFSM and channel attention design. The following visual comparisons further demonstrated performance differences between CA-HRNet and the competing models.
Table 2
| Model | Dice coefficient (%) | IoU (%) | |||||
|---|---|---|---|---|---|---|---|
| Benign | Malignant | All | Benign | Malignant | All | ||
| U-Net | 83.1 | 62.3 | 75.7 | 74.8 | 52.3 | 66.7 | |
| U-Net + TTA | 83.8 | 63.5 | 76.5 | 75.6 | 53.0 | 67.5 | |
| SegFormer | 81.9 | 61.0 | 74.4 | 73.5 | 50.7 | 65.3 | |
| SegFormer + TTA | 82.4 | 62.3 | 75.2 | 73.9 | 51.7 | 66.0 | |
| TransUNet | 78.9 | 56.5 | 70.9 | 69.9 | 46.3 | 61.5 | |
| TransUNet + TTA | 81.3 | 60.1 | 73.8 | 72.6 | 49.3 | 64.3 | |
| DAC-Net | 82.3 | 59.3 | 74.0 | 73.5 | 49.4 | 64.9 | |
| DAC-Net + TTA | 82.9 | 61.7 | 75.3 | 74.3 | 51.5 | 66.2 | |
| HRNet | 84.2 | 62.8 | 76.6 | 76.4 | 52.9 | 68.0 | |
| HRNet + TTA | 84.5 | 65.5 | 77.7 | 76.7 | 54.8 | 68.9 | |
| CA-HRNet | 84.4 | 64.0 | 77.1 | 76.6 | 54.5 | 68.7 | |
| CA-HRNet + TTA | 85.5† | 66.2† | 78.6† | 77.9† | 56.0† | 70.0† | |
Performance comparison of different models on the thyroid nodule segmentation task. TTA indicates the application of test-time augmentation. † indicates that the corresponding model achieves the best performance. CA-HRNet, Channel Attention High-Resolution Network; HRNet, High-Resolution Network; IoU, intersection over union; TransUNet, Transformer U-Net.
Visual comparison and analysis
To intuitively demonstrate the segmentation performance of different models, representative cases from the test set were selected for visual comparison. The following figures present side-by-side comparisons of the segmentation results generated by U-Net, SegFormer, TransUNet, DAC-Net, HRNet, and CA-HRNet, alongside the corresponding ground truth annotations. Cases 1–4 are malignant nodules, characterized by irregular shapes and blurred boundaries, while Cases 5–8 are benign nodules, typically exhibiting clearer margins and more regular morphology.
U-Net performs adequately on nodules with well-defined boundaries but tends to under-segment or lose fine details in regions with low contrast or complex morphology (Figure 5). U-Net + TTA enhances the model’s robustness and delineates more complete lesion contours with improved boundary adherence. It also corrects some instances where U-Net misclassified normal tissue as a lesion. However, U-Net + TTA can introduce new errors, such as mis-segmenting contralateral normal thyroid tissue, intravascular contrast heterogeneity, or other anatomical structures as lesions, and the incidence of such errors is notably higher than in other models. A prominent example is Case 7, where U-Net + TTA was the only model among those utilizing TTA that extensively misidentified contralateral vascular structures as part of the lesion.
The SegFormer, leveraging its Transformer-based architecture, demonstrates strong long-range dependency modeling and effectively captures the overall region of nodules, even in complex backgrounds (Figure 6). However, it shows limited precision in fine boundary alignment and preservation of small structures, especially on medical imaging datasets of limited size. Its inference speed is also relatively slow. SegFormer + TTA enhances recognition consistency and reduces over-segmentation of normal thyroid or adjacent tissues to some extent. Nevertheless, it may still include extraneous normal thyroid tissue, particularly when segmenting small nodules or those with highly blurred margins.
The TransUNet, which aims to integrate the global context modeling of Transformers with U-Net’s local detail extraction, often exhibits noticeable fragmentation or over-smoothing in nodules with blurred or irregular margins (Figure 7). It frequently over-segments adjacent normal thyroid tissue or fails to capture the complete lesion boundary. TransUNet + TTA significantly mitigates these issues, leading to more coherent segmentations. However, this improvement comes at the cost of an increased tendency to misclassify normal thyroid tissue as lesion. While the severity of these errors is less than those sometimes seen with U-Net, their frequency of occurrence is higher.
Similar to U-Net, the DAC-Net may occasionally mis-segment adjacent vessels as lesions or fail to include parts of a lesion in areas of ambiguous boundary transition. DAC-Net + TTA effectively alleviates many of these inaccuracies (Figure 8). Importantly, the segmentation errors produced by DAC-Net, both with and without TTA, are typically minor and predominantly confined to boundary imperfections, rarely resulting in large-scale deviations from the true lesion area.
HRNet maintains high-resolution feature representations through its parallel multi-branch architecture, providing a strong advantage for capturing fine details in medical image segmentation (Figure 9). The baseline model performs robustly on both benign and malignant nodules. HRNet + TTA offers further refinement, improving segmentation efficiency and consistency. In numerous cases, the results from HRNet + TTA are largely congruent with manual annotations. The primary remaining challenge lies in accurately delineating nodules with extremely faint or poorly defined edges.
CA-HRNet integrates the multi-resolution feature preservation of HRNet with a channel attention mechanism (Figure 10). The CA-HRNet achieves high boundary adherence for benign nodules and maintains reasonable structural consistency for malignant ones despite their challenging morphology. It delivers the best overall segmentation performance among the compared models, particularly in detail retention and malignant nodule recognition. CA-HRNet + TTA further refines the results, yielding excellent outcomes even for difficult malignant cases. Isolated errors may occur, such as the occasional misclassification of normal thyroid tissue as lesion, but these are infrequent.
Visual analysis confirms that the proposed CA-HRNet produces segmentation results most closely aligned with ground truth across most cases. It outperforms the comparative models in preserving edge details, detecting small targets, and handling irregularly shaped regions. This performance advantage suggests that the integrated channel attention mechanism effectively enhances the model’s ability to perceive and reconstruct critical anatomical features.
Ablation study and analysis
To thoroughly evaluate the individual and synergistic contributions of the proposed components, we conducted an ablation study focusing on the CFSM and the CACM. The CFSM introduced a novel dynamic channel selection paradigm at the feature fusion stage, which is directly compared against the conventional channel weighting paradigm represented by the SE module. Performance was assessed using the DC, IoU, and computational complexity measured in multiply-accumulate operations (MACs).
As shown in Table 3, the ablation study revealed three main findings. First, CFSM achieved better benign nodule segmentation than the conventional SE module (85.4% versus 84.8% DC), whereas SE showed a marginal advantage for malignant nodules (66.1% versus 65.2%). Second, combining CFSM with CACM produced the best overall segmentation performance (78.6% DC and 70.0% IoU), suggesting that dynamic channel selection and attention-based refinement are complementary. Third, this configuration reduced computational complexity to 25.376 G MACs, representing a 73% decrease compared with the CFSM-only configuration. This accuracy-efficiency balance is important for future implementation in clinical image-analysis workflows, particularly when computational resources are limited.
Table 3
| Setting | Dice coefficient (%) | IoU (%) | MACs (G) | |||||
|---|---|---|---|---|---|---|---|---|
| Benign | Malignant | All | Benign | Malignant | All | |||
| CFSM | 85.4 | 65.2 | 78.2 | 77.7 | 54.9 | 69.6 | 93.764 | |
| CFSM + CACM | 85.5† | 66.2† | 78.6† | 77.9† | 56.0† | 70.0† | 25.376† | |
| SE | 84.8 | 66.1 | 78.1 | 77.1 | 55.7 | 69.5 | 87.364 | |
| SE + CACM | 85.5 | 65.6 | 78.4 | 77.8 | 55.3 | 69.7 | 31.776 | |
Ablation study results of the CFSM and the CACM within CA-HRNet. † indicates that the corresponding model achieves the best performance. CA-HRNet, Channel Attention High-Resolution Network; CACM, Channel Attention Convolution Module; CFSM, Channel Feature Selection Module; IoU, intersection over union; MACs (G), multiply-accumulate operations (in Giga); SE, squeeze-and-excitation module (representing the channel weighting paradigm).
Discussion
This study developed and internally validated CA-HRNet for automatic thyroid nodule segmentation on contrast-enhanced CT images from 500 pathologically confirmed patients. The main clinical motivation was to reduce dependence on manual ROI delineation, a time-consuming and subjective step that can affect downstream radiomics and computer-aided diagnosis. On the internal test set, CA-HRNet + TTA achieved the highest overall DC and IoU among the evaluated models, supporting its ability to produce reproducible CT lesion masks.
The technical contribution of CA-HRNet is the integration of HRNet’s high-resolution feature preservation with channel attention and dynamic feature selection. Unlike approaches that only increase model complexity, the CFSM-CACM design reduced redundant feature channels while maintaining or improving segmentation accuracy. This is clinically relevant because image-analysis tools must be sufficiently efficient to be integrated into routine workflows and scalable radiomics pipelines.
The lower DC for malignant nodules (66.2%) than for benign nodules (85.5%) is clinically meaningful. This performance gap likely stems from the inherent and often more complex imaging phenotypes of malignant thyroid nodules. Specifically, malignant nodules frequently present with ill-defined or infiltrative margins, irregular and spiculated contours, and heterogeneous internal architecture—features that pose significant challenges to automated segmentation systems. Beyond shape irregularity, malignant lesions often exhibit poor contrast differentiation from adjacent thyroid parenchyma or strap muscles, complicating boundary delineation. Furthermore, internal heterogeneity—manifesting as areas of cystic degeneration, necrosis, hemorrhage, or microcalcifications—introduces additional variability in texture and intensity, which can confuse intensity-based segmentation algorithms. These characteristics not only challenge human interpreters but also limit the ability of conventional convolutional operators to capture such irregular and high-frequency structural details. In addition, the relatively smaller size of many malignant nodules in our dataset (median diameter 0.9 cm) compared to benign ones (median diameter 2.4 cm) may further exacerbate segmentation difficulty, as small objects are inherently more susceptible to partial-volume effects and require finer spatial sensitivity. Future work should therefore focus on developing more sophisticated segmentation strategies tailored to these challenging cases, including boundary-aware loss functions, texture-adaptive feature extraction, and strategies that explicitly focus on ambiguous margins.
The strengths of this study include a relatively large pathologically confirmed CT cohort, balanced representation of benign and malignant nodules, patient-level division of data, and comparison with several widely used segmentation architectures. The subgroup analysis by pathological status was used to characterize segmentation difficulty across clinically distinct nodule types. Importantly, the model should be interpreted as a segmentation tool that may facilitate downstream diagnostic modeling, not as a stand-alone benign-malignant classifier.
Several limitations should be acknowledged. First, all data were obtained from a single institution using a consistent scanner and imaging protocol, so the internal test set cannot substitute for independent external validation. Second, although pathology was used as the reference standard for benign or malignant status, the present model did not output a diagnostic probability; therefore, AUC, sensitivity, specificity, and calibration metrics for malignancy prediction were not applicable. Third, the study focused on CT alone and did not integrate ultrasound, clinical laboratory data, molecular markers, or genetic profiles. Future multicenter and prospective studies should evaluate generalizability across scanners and institutions and determine whether CA-HRNet-derived masks improve radiomics-based diagnosis, prognosis prediction, or treatment planning.
Conclusions
This study constructed a pathologically confirmed CT dataset of thyroid nodules and proposed CA-HRNet, an automated segmentation model incorporating channel attention and feature selection. Experimental results demonstrated that CA-HRNet achieves accurate and computationally efficient automated segmentation of thyroid nodules, outperforming several state-of-the-art models. These findings support its potential as a reproducible ROI-generation tool for CT-based radiomics and computer-aided diagnosis, but external validation and downstream clinical task testing are needed before routine clinical use.
Acknowledgments
The authors thank the patients and clinical staff. During the preparation of this work, the authors used DeepSeek to enhance the clarity and quality of the writing. After using these tools, the authors reviewed and edited the content as needed and take full responsibility for the published article.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0162/rc
Data Sharing Statement: Available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0162/dss
Peer Review File: Available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0162/prf
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0162/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of Quzhou People’s Hospital (approval No. 2025-101) and informed consent was taken from all individual participants.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Pizzato M, Li M, Vignat J, et al. The epidemiological landscape of thyroid cancer worldwide: GLOBOCAN estimates for incidence and mortality rates in 2020. Lancet Diabetes Endocrinol 2022;10:264-72. [Crossref] [PubMed]
- Li M, Dal Maso L, Pizzato M, et al. Thyroid cancer in adolescents and young adults: a population-based study in 185 countries worldwide. Lancet Diabetes Endocrinol 2026;14:112-22. [Crossref] [PubMed]
- Tessler FN, Middleton WD, Grant EG, et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J Am Coll Radiol 2017;14:587-595. [Crossref] [PubMed]
- Ringel MD, Sosa JA, Baloch Z, et al. 2025 American Thyroid Association Management Guidelines for Adult Patients with Differentiated Thyroid Cancer. Thyroid 2025;35:841-985. [Crossref] [PubMed]
- Li G, Chen R, Zhang J, et al. Fusing enhanced Transformer and large kernel CNN for malignant thyroid nodule segmentation. Biomedical Signal Processing and Control 2023;83:104636.
- Alexander EK, Cibas ES. Diagnosis of thyroid nodules. Lancet Diabetes Endocrinol 2022;10:533-9. [Crossref] [PubMed]
- Boers T, Braak SJ, Rikken NET, et al. Ultrasound imaging in thyroid nodule diagnosis, therapy, and follow-up: Current status and future trends. J Clin Ultrasound 2023;51:1087-100. [Crossref] [PubMed]
- Xie Y, Yang Z, Yang Q, et al. Identification method of thyroid nodule ultrasonography based on self-supervised learning dual-branch attention learning framework. Health Inf Sci Syst 2024;12:7. [Crossref] [PubMed]
- Chen L, Chen L, Liu J, et al. Value of Qualitative and Quantitative Contrast-Enhanced Ultrasound Analysis in Preoperative Diagnosis of Cervical Lymph Node Metastasis From Papillary Thyroid Carcinoma. J Ultrasound Med 2020;39:73-81. [Crossref] [PubMed]
- Abdolali F, Kapur J, Jaremko JL, et al. Automated thyroid nodule detection from ultrasound imaging using deep convolutional neural networks. Comput Biol Med 2020;122:103871. [Crossref] [PubMed]
- Xiong S, Fu Z, Deng Z, et al. Machine learning-based CT radiomics enhances bladder cancer staging predictions: A comparative study of clinical, radiomics, and combined models. Med Phys 2024;51:5965-77. [Crossref] [PubMed]
- O'Sullivan NJ, Temperley HC, Horan MT, et al. Computed tomography (CT) derived radiomics to predict post-operative disease recurrence in gastric cancer; a systematic review and meta-analysis. Curr Probl Diagn Radiol 2024;53:717-22. [Crossref] [PubMed]
- Barry N, Kendrick J, Molin K, et al. Evaluating the impact of the Radiomics Quality Score: a systematic review and meta-analysis. Eur Radiol 2025;35:1701-13. [Crossref] [PubMed]
- Lambin P, Rios-Velazquez E, Leijenaar R, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012;48:441-6. [Crossref] [PubMed]
- Cao Y, Zhong X, Diao W, et al. Radiomics in Differentiated Thyroid Cancer and Nodules: Explorations, Application, and Limitations. Cancers (Basel) 2021;13:2436. [Crossref] [PubMed]
- Zhao Q, Guo S, Zhang Y, et al. Multimodal ultrasound radiomics model combined with clinical model for differentiating follicular thyroid adenoma from carcinoma. BMC Med Imaging 2025;25:152. [Crossref] [PubMed]
- Lin S, Gao M, Yang Z, et al. CT-Based Radiomics Models for Differentiation of Benign and Malignant Thyroid Nodules: A Multicenter Development and Validation Study. AJR Am J Roentgenol 2024;223:e2431077. [Crossref] [PubMed]
- Du J, He X, Fan R, et al. Artificial intelligence-assisted precise preoperative prediction of lateral cervical lymph nodes metastasis in papillary thyroid carcinoma via a clinical-CT radiomic combined model. Int J Surg 2025;111:2453-66. [Crossref] [PubMed]
- Gao X, Ran X, Ding W. The progress of radiomics in thyroid nodules. Front Oncol 2023;13:1109319. [Crossref] [PubMed]
- Chen C, Mat Isa NA, Liu X. A review of convolutional neural network based methods for medical image classification. Comput Biol Med 2025;185:109507. [Crossref] [PubMed]
- Rayed MdE. Deep learning for medical image segmentation: State-of-the-art advancements and challenges. Informatics in Medicine Unlocked 2024;47:101504.
- Kendrick J, Francis RJ, Hassan GM, et al. Fully automatic prognostic biomarker extraction from metastatic prostate lesion segmentations in whole-body (68)GaGa-PSMA-11 PET/CT images. Eur J Nucl Med Mol Imaging 2022;50:67-79. [Crossref] [PubMed]
- Kang W, Qiu X, Luo Y, et al. Application of radiomics-based multiomics combinations in the tumor microenvironment and cancer prognosis. J Transl Med 2023;21:598. [Crossref] [PubMed]
- Maroulis DE, Savelonas MA, Iakovidis DK, et al. Variable background active contour model for computer-aided delineation of nodules in thyroid ultrasound images. IEEE Trans Inf Technol Biomed 2007;11:537-43. [Crossref] [PubMed]
- Savelonas MA, Iakovidis DK, Legakis I, et al. Active contours guided by echogenicity and texture for delineation of thyroid nodules in ultrasound images. IEEE Trans Inf Technol Biomed 2009;13:519-27. [Crossref] [PubMed]
- Li Z, Liu C, Liu G, et al. A novel statistical image thresholding method. AEU - International Journal of Electronics and Communications 2010;64:1137-47.
- Du W, Sang N. An effective segmentation method of ultrasonic thyroid nodules. In: Liu J, editor. Proc. SPIE 9814, MIPPR 2015: Parallel Processing of Images and Optimization; and Medical Imaging Processing. Enshi, China; 2015:98140F.
- Bi L, Shuang Z. Diagnosis of Thyroid Nodules Based on Local Non-quantitative Multi-Directional Texture Descriptor with Rotation Invariant Characteristics for Ultrasound Image. J Med Syst 2019;43:231. [Crossref] [PubMed]
- Benabdallah FZ, Djerou L. Active Contour Extension Basing on Haralick Texture Features, Multi-gene Genetic Programming, and Block Matching to Segment Thyroid in 3D Ultrasound Images. Arab J Sci Eng 2023;48:2429-40.
- Peng B, Lin W, Zhou W, et al. Enhanced pediatric thyroid ultrasound image segmentation using DC-Contrast U-Net. BMC Med Imaging 2024;24:275. [Crossref] [PubMed]
- Xiang Z, Tian X, Liu Y, et al. Federated learning via multi-attention guided UNet for thyroid nodule segmentation of ultrasound images. Neural Netw 2025;181:106754. [Crossref] [PubMed]
- Dong P, Zhang R, Li J, et al. An ultrasound image segmentation method for thyroid nodules based on dual-path attention mechanism-enhanced UNet+. BMC Med Imaging 2024;24:341. [Crossref] [PubMed]
- Wang S, Liu Z, Shi G, et al. MFS-Unet: A Multi-Path Vision Mamba Network for Precise Thyroid Nodule Segmentation. IET Syst Biol 2026;20:e70044. [Crossref] [PubMed]
- Ye D, Lan K, Cheng J, et al. MFA-Net: multi-scale feature aggregation network with background-aware module for ultrasound segmentation of thyroid nodules. Quant Imaging Med Surg 2025;15:12167-89. [Crossref] [PubMed]
- Sun X, Li X, Yang Z, et al. RTS-Net: thyroid nodule segmentation network integrating dual-path attention and graph convolution. Front Med (Lausanne) 2026;13:1785796. [Crossref] [PubMed]
- Liu C, Chen S, Yang Y, et al. The value of the computer-aided diagnosis system for thyroid lesions based on computed tomography images. Quant Imaging Med Surg 2019;9:642-53. [Crossref] [PubMed]
- Peng W, Liu C, Xia S, et al. Thyroid nodule recognition in computed tomography using first order statistics. Biomed Eng Online 2017;16:67. [Crossref] [PubMed]
- Zhao Z, Ye C, Hu Y, et al. Cascade and Fusion of Multitask Convolutional Neural Networks for Detection of Thyroid Nodules in Contrast-Enhanced CT. Comput Intell Neurosci 2019;2019:7401235. [Crossref] [PubMed]
- Li W, Cheng S, Qian K, et al. Automatic Recognition and Classification System of Thyroid Nodules in CT Images Based on CNN. Comput Intell Neurosci 2021;2021:5540186. [Crossref] [PubMed]
- Wan J, Liu L, Wang H, et al. UNSX-HRNet: Modeling anatomical uncertainty for landmark detection in total hip arthroplasty. Comput Biol Med 2025;198:111146. [Crossref] [PubMed]
- Hu J, Shen L, Albanie S, et al. Squeeze-and-Excitation Networks. IEEE Trans Pattern Anal Mach Intell 2020;42:2011-23. [Crossref] [PubMed]
- Zhu YC, Jin PF, Bao J, et al. Thyroid ultrasound image classification using a convolutional neural network. Ann Transl Med 2021;9:1526. [Crossref] [PubMed]
- Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, editors. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Springer International Publishing; 2015:234-41.
- Zheng Y, Xie J, Sain A, et al. Sketch-Segformer: Transformer-Based Segmentation for Figurative and Creative Sketches. IEEE Trans Image Process 2023;32:4595-609. [Crossref] [PubMed]
- Chen J, Mei J, Li X, et al. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med Image Anal 2024;97:103280. [Crossref] [PubMed]
- Yang Y, Huang H, Shao Y, et al. DAC-Net: A light-weight U-shaped network based efficient convolution and attention for thyroid nodule segmentation. Comput Biol Med 2024;180:108972. [Crossref] [PubMed]
- Liu L, Jiang H, He P, et al. On the Variance of the Adaptive Learning Rate and Beyond. arXiv:1908.03265 [Preprint]. 2019 [cited 2026 Jan 29]. Available online: https://arxiv.org/abs/1908.03265

