Development and validation of an Automated computed tomography segmentation model for thyroid nodules using a channel attention high-resolution network

Ning Li; Mingjie Jiang; Yang Huang; Guozheng Zhang; Shufeng Xu; Weitao Huang; Xiaowei Han; Xisong Zhu

doi:10.21037/gs-2026-0162

Original Article

Development and validation of an Automated computed tomography segmentation model for thyroid nodules using a channel attention high-resolution network

Ning Li^1,2#, Mingjie Jiang^3#, Yang Huang⁴, Guozheng Zhang^2,5, Shufeng Xu², Weitao Huang², Xiaowei Han², Xisong Zhu²

¹Jinhua Graduate Joint Training Base, Zhejiang Chinese Medical University, Jinhua, China; ²Department of Radiology, Wenzhou Medical University Affiliated Quzhou Hospital (Quzhou People’s Hospital), Quzhou, China; ³College of Electrical and Information Engineering, Quzhou University, Quzhou, China; ⁴Department of Radiology, The Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, China; ⁵Department of Quzhou Key Laboratory of Respiratory Mechanics in Critical Care Medicine, Wenzhou Medical University Affiliated Quzhou Hospital (Quzhou People’s Hospital), Quzhou, China

Contributions: (I) Conception and design: N Li, X Han, X Zhu; (II) Administrative support: W Huang, S Xu, X Zhu; (III) Provision of study materials or patients: N Li, X Zhu; (IV) Collection and assembly of data: N Li, Y Huang, S Xu, W Huang; (V) Data analysis and interpretation: N Li, M Jiang, G Zhang, X Han; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Xisong Zhu, MD. Department of Radiology, Wenzhou Medical University Affiliated Quzhou Hospital (Quzhou People’s Hospital), No. 100 Minjiang Avenue, Kecheng District, Quzhou 324000, China. Email: zhuxisong@126.com.

Background: Thyroid nodules are highly prevalent. Accurate segmentation of thyroid nodules is a prerequisite for computed tomography (CT)-based radiomics and computer-aided diagnosis, but manual region-of-interest delineation is time-consuming, subjective, and difficult to standardize. Although ultrasound remains the first-line imaging modality for thyroid nodule assessment, contrast-enhanced CT is frequently used in selected patients who have complex anatomy, suspected local extension, or require preoperative evaluation. Current deep learning models suffer from insufficient boundary capture for irregular nodules and high computational complexity, which has led to a lack of clinically practical tools. This study aimed to develop and internally validate an efficient automated CT segmentation model for thyroid nodules.

Methods: This single-center retrospective study included CT images from 500 patients with pathologically confirmed thyroid nodules (250 benign and 250 malignant). Patients were divided at the patient level into training, validation, and internal test sets comprising 350, 50, and 100 patients, respectively. Eligible patients had thyroid CT and histopathological confirmation, and manual lesion contours served as the ground truth for segmentation. We propose Channel Attention High-Resolution Network (CA-HRNet), which enhances High-Resolution Network (HRNet) with a Channel Feature Selection Module (CFSM) that dynamically fuses multi-scale features by retaining discriminative channels and pruning redundant features, balancing representational capacity and computational efficiency. The model was trained using Dice loss with the RAdam optimizer and data augmentation. Segmentation performance was evaluated using the Dice coefficient (DC) and intersection over union (IoU), and CA-HRNet was compared with U-Net, SegFormer, Transformer U-Net(TransUNet), DAC-Net, and HRNet. Test-time augmentation (TTA) was applied during inference.

Results: The cohort had a median age of 48 years, 373 female patients (74.6%), and a median nodule diameter of 1.60 cm. On the internal test set, CA-HRNet + TTA achieved the best overall segmentation performance, with a DC of 78.6% and an IoU of 70.0%. Performance was higher for benign nodules (Dice, 85.5%) than for malignant nodules (Dice, 66.2%), reflecting the more irregular morphology and less distinct boundaries of malignant lesions. Ablation experiments showed that the combination of CFSM and Channel Attention Convolution Module (CACM) achieved the best accuracy-efficiency balance and reduced computational complexity by 73%, from 93.764 to 25.376 giga multiply-accumulate operations (Giga MACs). Qualitative assessment showed that CA-HRNet produced masks with improved boundary adherence and detail preservation compared with competing models.

Conclusions: CA-HRNet achieved accurate and computationally efficient automated segmentation of thyroid nodules on a single-center internal CT test set. The model may support reproducible region of interest (ROI) generation for CT-based radiomics and computer-aided diagnosis; however, multicenter external validation and downstream diagnostic or prognostic testing are required before routine clinical implementation.

Keywords: Thyroid nodule segmentation; channel attention; High-Resolution Network (HRNet); computed tomography (CT); automated diagnosis

Submitted Mar 13, 2026. Accepted for publication May 20, 2026. Published online Jun 26, 2026.

doi: 10.21037/gs-2026-0162

Highlight box

Key findings

• The proposed Channel Attention High-Resolution Network (CA-HRNet) achieved the best performance in thyroid nodule segmentation on computed tomography (CT) images, based on a single-center retrospective dataset of 500 patients with pathologically confirmed thyroid nodules. It attained an overall Dice coefficient of 78.6%, while reducing computational complexity by 73% compared with the Channel Feature Selection Module-only configuration.

What is known and what is new?

• Reliable CT nodule segmentation is essential for radiomics and computer-aided diagnosis, but manual region of interest delineation is subjective and time-consuming, and existing high-resolution segmentation models may be computationally demanding.

• This study introduced a channel attention mechanism and dynamic feature selection module for automated CT segmentation. The model improved boundary delineation, particularly in difficult malignant nodules, while reducing redundant computation.

What is the implication, and what should change now?

• CA-HRNet may provide reproducible CT masks for downstream quantitative analysis and radiomics workflows. Future work should focus on multicenter validation, prospective testing, and evaluation of whether automated masks improve clinically interpretable benign-malignant differentiation or outcome prediction.

Introduction

Thyroid nodules are common findings in clinical practice, and the incidence of thyroid cancer has increased substantially in many regions, partly because of the wider use of neck imaging and surveillance (1,2). Contemporary guideline frameworks, including the 2025 American Thyroid Association (ATA) guidance for differentiated thyroid cancer and the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS), emphasize risk-adapted evaluation, avoidance of unnecessary procedures, and consistent imaging-based communication (3,4). In this context, accurate and reproducible image analysis is important because overestimation or underestimation of lesion extent can affect biopsy decisions, surgical planning, radiomics feature extraction, and follow-up assessment (5).

Currently, thyroid and cervical lymph node ultrasound serves as the primary imaging method for evaluating thyroid nodules, often guiding the decision to perform fine-needle aspiration biopsy (FNAB) (6,7). However, ultrasound diagnosis is heavily reliant on operator experience, with high subjectivity and significant inter-observer variability, which cannot meet the requirements of standardized quantitative analysis (8). Data show that while the positive predictive value of ultrasound screening can reach 20–80%, its diagnostic rate for thyroid cancer is only 5–15% (9), potentially leading to unnecessary biopsies, increased patient discomfort and healthcare costs. As a complementary modality, computed tomography (CT) provides superior spatial resolution and clearer depiction of anatomical relationships, making it a valuable adjunct tool, especially for complex cases such as those with retrosternal extension, large multinodular goiters, or suspected invasion of surrounding structures (4). Therefore, CT-based quantitative tools may be useful when CT has already been obtained for clinical indications, particularly if they can generate objective and reproducible lesion masks. Nonetheless, the interpretation of both ultrasound and CT images still relies heavily on qualitative assessment by radiologists, which lacks standardized, objective, and quantifiable biomarkers for precise nodule characterization (10).

To address this clinical imperative, radiomics has emerged as a pivotal methodological framework in quantitative imaging analysis (11-13). It converts medical images into high-dimensional features that may capture lesion heterogeneity and phenotypic characteristics beyond visual assessment (14). In this context, radiomics research based on both ultrasound and CT imaging has emerged and achieved notable progress (15). Zhao et al. (16) developed a multimodal ultrasound radiomics nomogram for differentiating follicular thyroid adenoma from carcinoma, and Lin et al. (17) reported CT-based radiomics models for benign-malignant thyroid nodule differentiation in a multicenter setting. Du et al. (18) further integrated clinical features and CT radiomics to predict lateral cervical lymph node metastasis, a clinically important determinant of prognosis and surgical strategy (19). These studies highlight the potential value of quantitative thyroid imaging, while also underscoring the dependence of radiomics on reliable lesion segmentation.

Image segmentation is essential for a wide range of clinical applications beyond radiomics, including tumor detection, disease monitoring, and surgical planning (20). However, the lack of efficient automated segmentation hinders the clinical translation of CT-based thyroid radiomics, as manual region of interest (ROI) delineation is time-consuming, subjective, and poorly reproducible (21-23). This bottleneck underscores the need for efficient and accurate automated segmentation techniques. Medical image segmentation has undergone a profound transformation from traditional image processing techniques to data-driven deep learning architectures (24-29).

Convolutional Neural Networks (CNNs), particularly U-Net and its numerous variants, have become the mainstream paradigm for medical image segmentation. Peng et al. (30) developed DC-Contrast U-Net specifically for pediatric thyroid ultrasound images, achieving a mean intersection over union (mIoU) of 0.866 with an extremely low parameter count. Xiang et al. (31) proposed a federated learning-based multi-attention guided UNet (MAUNet), attaining high generalization with Dice coefficients (DCs) of 0.887–0.912 across private multi-center datasets. On the public TN3K dataset, Dong et al. (32) embedded a dual-path attention mechanism into UNet++, obtaining an IoU of 0.745 and a Recall of 0.870; Wang et al. (33) introduced a multi-path vision Mamba module, raising the Dice to 79.28%; Ye et al. (34) leveraged background-aware and multi-scale feature aggregation modules, pushing the Dice to 0.8616; and Sun et al. (35) proposed RTS-Net, which integrated cascaded graph convolution and dual-path attention, achieving a leading IoU of 71.87% with a lightweight architecture.

Compared with ultrasound, automated thyroid nodule segmentation in CT faces different challenges, including variable enhancement, complex surrounding anatomy, partial-volume effects, and artifacts related to contrast injection or motion. Traditional machine learning studies have used handcrafted CT texture features and support vector machine (SVM) classifiers to distinguish nodules from normal tissue, with reported accuracy values of 0.880 and 0.8673 in prior studies (36,37). With the development of deep learning, Zhao et al. (38) designed a fully automatic detection algorithm using an improved Dense-U-Net for ROI segmentation and a multidimensional input-fusion CNN for classification, while Li et al. (39) proposed an EfficientNet-based U-Net for automatic recognition and classification of thyroid nodules in CT images. However, available CT studies remain limited by smaller datasets, feature-engineering dependence, or insufficient optimization for computational efficiency.

Thus, there remains a need for an automated CT segmentation model that is robust to the complex morphology of malignant nodules, developed on a pathologically confirmed cohort, and efficient enough for future clinical translation. The present study focused on segmentation rather than direct benign-malignant classification. Its clinical role is to generate reliable CT masks that can support downstream radiomics, computer-aided diagnosis, and quantitative follow-up, thereby bridging pixel-level computer vision performance with clinically interpretable workflows.

The aims of this study were:

To construct a large-scale, pathologically confirmed CT dataset of 500 patients with thyroid nodules and describe its clinical characteristics.
To develop and internally validate Channel Attention High-Resolution Network (CA-HRNet), a high-resolution segmentation network that incorporates channel attention convolution to improve feature representation.
To propose and evaluate a Channel Feature Selection Module (CFSM) for multi-scale feature fusion, and to compare segmentation accuracy and computational complexity with mainstream models.

We present this article in accordance with the TRIPOD reporting checklist (available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0162/rc).

Methods

Dataset description

This single-center retrospective model-development and internal-validation study was approved by the Ethics Committee of Quzhou People’s Hospital (approval No. 2025-101) and informed consent was taken from all individual participants. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. Data were obtained from patients with thyroid nodules who underwent CT imaging at Quzhou People’s Hospital between 2017 and 2024. The final dataset included 500 patients with pathologically confirmed thyroid nodules, comprising 250 benign and 250 malignant cases. The dataset was divided at the patient level into training, validation, and internal test sets, including 8,721, 1,121, and 2,517 images from 350, 50, and 100 patients, respectively; no patient contributed images to more than one set.

All included patients met the following inclusion criteria: (I) underwent non-contrast and contrast-enhanced thyroid CT scans; (II) underwent thyroid surgery with pathologically confirmed diagnosis; (III) presented with a single thyroid nodule with the longest diameter ≥5 mm. Exclusion criteria included: (I) time gap between postoperative pathological report and imaging examination exceeding 30 days; (II) severe artifacts or damage in CT images; (III) prior surgery, radiotherapy, chemotherapy, or FNAB before CT examination; (IV) history of malignant tumors.

Histopathological diagnosis from surgical specimens served as the reference standard for benign or malignant status. Baseline clinical variables, including age, sex, nodule diameter, and nodule location, were extracted from medical records and used for cohort description and subgroup interpretation; these variables were not incorporated as predictors in the segmentation network.

Imaging was performed using a Toshiba Aquilion ONE TSX-301C, 320-detector-row wide-body high-speed CT scanner. Scanning parameters were as follows: tube voltage 100 kV, tube current automatically controlled by CareDose4D, slice thickness 0.5 mm, slice spacing 0.5 mm, rotation time 0.275 s, reconstruction matrix 512×512, using the FC04 soft tissue algorithm. All CT images were standardized by professional physicians, with appropriate window width and window level adjustments for clear visualization of thyroid nodules.

Manual thyroid nodule contours were generated by two physicians from the same institution on the CT images. The annotation process was performed using the CT images, and pathological labels were not used as input during segmentation model training. In the internal test set, the manual annotations served as the ground truth for evaluating segmentation performance.

No formal sample size calculation was performed because this was an exploratory retrospective segmentation model-development study. Instead, all eligible patients available during the study period who met the inclusion and exclusion criteria were included. The balanced benign and malignant composition was used to permit subgroup assessment of segmentation difficulty, rather than to train a diagnostic classifier.

After application of the inclusion and exclusion criteria, the final analysis cohort comprised 500 eligible patients. These patients were allocated at the patient level to the training, validation, and internal test sets as described above.

Model architecture

We proposed a Channel Attention High Resolution Net (CA-HRNet) based on High-Resolution Network (HRNet) (40) to perform thyroid nodule segmentation. Traditional segmentation networks follow a “downsample-then-upsample” pipeline, which inevitably loses important spatial details. HRNet, by contrast, starts with a high-resolution convolution stream and gradually adds parallel low-resolution streams. In addition, HRNet introduces repeated multi-scale fusion between parallel streams of different resolutions. At each stage, features from high-to-low and low-to-high resolution streams are exchanged and fused. This design enables the network to preserve spatial detail while incorporating semantic context.

This bidirectional fusion enables the network to capture both high-resolution spatial details and low-resolution semantic information simultaneously. Compared with HRNet, our proposed model CA-HRNet replaced convolution module with Channel Attention Convolution Module (CACM) in stages 2, 3 and 4 (Figure 1). CACM calculated the features as follows. Let $x \in ℝ^{C \times H \times W}$ be the input feature map. First, the input feature map $x$ is fed into a convolution module which is composed of a 3×3 convolutional layer, a batch normalization (BN) layer and ReLU activation function which reduces the number of channels from $C$ to $C / R$ and outputs the feature map $x_{c} \in ℝ^{C / R \times H \times W}$ , where $R = 8$ is the reduction factor. Next, $x_{c}$ is passed through the channel attention module: the spatial dimensions are flattened to obtain the query, key and value matrices $q \in ℝ^{C / R \times H W}$ , $k = q^{T} \in ℝ^{H W \times C / R}$ , $v = q \in ℝ^{C / R \times H W}$ . Then the channel correlation matrix is computed by matrix multiplication. $Α = q k \in ℝ^{C / R \times C / R}$ . Next, the correlation matrix is inverted by subtracting each element from the maximum value of its row to generate the new correlation matrix $A [i, j] \leftarrow \max (A [i, :]) - A [i, j]$ which ensures $A [i, j] \geq 0$ and sets the position of the original maximum value to 0. In addition, the larger values in $A$ (corresponding to weaker correlations in the original channel correlation) yield higher attention weights. This design aims to explore potential complementary information between channels and avoid feature redundancy caused by over-reliance on strong correlations. Then we calculate the output by $o u t \leftarrow α *softmax (A) v + x_{c} \in ℝ^{C / R \times H \times W}$ , where $α$ is a learnable parameter. Subsequently, $o u t$ is input to a convolution module which consists of a 3×3 convolutional and a BN layer to restore the number of channels from $C / R$ to $C$ and obtain $x_{r} = c o n v (B N (o u t)) \in ℝ^{C \times H \times W}$ . Finally, we applied a residual path and ReLU to produce the output $y = R e L U (x_{r} + x)$ . The diagram of this module is given in Figure 2.

Figure 1 Backbone of CA-HRNet. Different colors refer to the feature maps with different resolutions. CA-HRNet, Channel Attention High Resolution Net.

Figure 2 Diagram of CACM. BN, batch normalization; CACM, Channel Attention Convolution Module; Conv, convolution; ReLU, rectified linear unit.

The number of branches progressively increases from 1 to 4 across the stages, and every branch is composed of 4 basic blocks. CA-HRNet produces 4 outputs ( ${\hat{y}}_{1}$ , ${\hat{y}}_{2}$ , ${\hat{y}}_{3}$ , ${\hat{y}}_{4}$ ) of different scales. We proposed a CFSM to generate final segmentation results. The calculation is as follows. First, ${\hat{y}}_{2}$ , ${\hat{y}}_{3}$ , ${\hat{y}}_{4}$ are up-sampled to the size of ${\hat{y}}_{1}$ . Then, ${\hat{y}}_{1}$ , ${\hat{y}}_{2}$ , ${\hat{y}}_{3}$ and ${\hat{y}}_{4}$ are concatenated along the channel dimension. The concatenated feature is employed to calculate channel attention using the squeeze-and-excitation (SE) block (41). SE block is designed to enhance the representational capacity of CNNs, which explicitly models the interdependencies between feature channels and amplifies informative features while suppressing less useful ones, without drastically increasing computational cost. After the channel attention is calculated, it is sorted from large to small and the top $k$ attention is selected. The corresponding attention maps are reserved to produce the final segmentation results. The calculation is shown as follows.

$\begin{array}{l} y_{c a t} = C o n c a t ({\hat{y}}_{1}, u p s a m p l e ({\hat{y}}_{2}), u p s a m p l e ({\hat{y}}_{3}), u p s a m p l e ({\hat{y}}_{4})) \\ C A = σ (M L P (G A P (y_{c a t}))) \end{array}$ [1]

$i n d e x = t o p (s o r t (C A), k)$ [2]

$y_{c s} = C A (i n d e x)$ [3]

where $k = 360$ ; GAP refers to the global average pooling which computes the average value of the entire spatial dimension of each feature channel. $σ (x) = \frac{1}{1 + e^{- x}}$ is the sigmoid function. MLP is the multi-layer perceptron which contains two linear layers. The calculation flow of MLP is as follows.

$M L P (x) = W_{2} R e L U (W_{1} x)$ [4]

where $x \in ℝ^{n}$ , $W_{1} \in ℝ^{\frac{n}{r} \times n}$ , $W_{2} \in ℝ^{n \times \frac{n}{r}}$ ; $r = 4$ is the reduction factor; $R e L U (x) = m a x (x, 0)$ is the rectified linear unit. $t o p (*, k)$ collects the indices of the top k values. Finally, the selected channel feature is fed into convolution (Conv), BN (42), and ReLU to produce the final segmentation result. The diagram of CFSM is shown in Figure 3.

Figure 3 Diagram of CFSM. BN, batch normalization; CFSM, Channel Feature Selection Module; Conv, convolution; ReLU, rectified linear unit; SE, squeeze-and-excitation.

The model was trained with Dice loss, a widely used metric-based loss function for evaluating and optimizing segmentation models. It was derived from the DC which computes the overlap between the predicted segmentation mask ( $\hat{y}$ ) and the ground truth mask ( $y$ ).

$L (\hat{y}, y) = 1 - D C (\hat{y}, y) = 1 - \frac{2 \sum_{i, j} {\hat{y}}_{i j} y_{i j}}{\sum_{i, j} {\hat{y}}_{i j} + \sum_{i, j} y_{i j}}$ [5]

where $\hat{y}$ and $y$ are the predicted and ground truth segmentation masks, respectively. The following data augmentation techniques were used to mitigate overfitting: random rotation between $[- 10^{°}, 10^{°}]$ ; random horizontal translation between [−0.1 W, 0.1 W]; random vertical translation between [−0.1 H, 0.1 H]; random zoom with factor between [0.8, 1.2]; random vertical and horizontal flips. Here $H \times W$ is the image size.

During inference stage, we employed test-time augmentation (TTA) to improve the segmentation performance. TTA is a post-training inference technique widely used in computer vision tasks to improve model robustness and prediction accuracy without retraining the model. It involves applying a series of data augmentations to the test image, generating multiple augmented versions of the same image, and aggregating the model’s predictions on these versions to produce a final result. The calculation is as follows.

$\hat{y} = V o t e (M (x), D_{1}^{- 1} (M (D_{1} (x))), \dots, D_{4}^{- 1} (M (D_{4} (x))))$ [6]

where $D_{k}$ refers to the data augmentation technique; $D_{k}^{- 1}$ is the reversed operation of $D_{k}$ . $V o t e$ refers to the pixel-wise voting operation. When any two of $M {(x)}_{i j}$ and $D_{k}^{- 1} {(M (D_{k} (x)))}_{i j} (k = 1, 2, 3, 4)$ are 1, ${\hat{y}}_{i j} = 1$ . In particular, we used the following data augmentation techniques: vertical flip, horizontal flip, rotation with 90°, rotation with −90°. The diagram of TTA is shown in Figure 4.

Figure 4 Diagram of TTA.

D_{k}

, data augmentation technique;

D_{k}^{- 1}

, the reversed operation of

D_{k}

; TTA, test-time augmentation.

To comprehensively evaluate the performance of the proposed CA-HRNet model, we compared it with mainstream segmentation models such as U-Net (43), SegFormer (44), Transformer U-Net (TransUNet) (45), DAC-Net (46) and HRNet on the internal test set. The DC and IoU were utilized to evaluate the segmentation performance. Because this study developed a segmentation model rather than a benign-malignant diagnostic classifier, discrimination and calibration indices such as area under the curve, sensitivity, specificity, and calibration plots were not calculated.

$D C (\hat{y}, y) = \frac{2 \sum_{i, j} {\hat{y}}_{i j} y_{i j}}{\sum_{i, j} {\hat{y}}_{i j} + \sum_{i, j} y_{i j}}$ [7]

$I o U (\hat{y}, y) = \frac{\sum_{i, j} {\hat{y}}_{i j} y_{i j}}{\sum_{i, j} {\hat{y}}_{i j} + \sum_{i, j} y_{i j} - \sum_{i, j} {\hat{y}}_{i j} y_{i j}}$ [8]

Statistical analysis

All statistical analyses were performed using R statistical software and SPSS. Continuous variables are presented as mean ± standard deviation (SD) and analyzed using independent t-tests or Mann-Whitney Wilcoxon tests, depending on the distribution of the variables. Categorical variables are described as proportions and compared between groups using the Chi-square test or Fisher’s exact test. Statistical significance was defined as P<0.05.

Results

The model was implemented in PyTorch and trained on an NVIDIA RTX A5000 GPU. We employed the RAdam optimizer (47) with a learning rate of 10⁻⁴, a batch size of 8, for 100 epochs. The baseline characteristics of the enrolled cohort were summarized in Table 1. The median age was 48 years, 373 patients (74.6%) were female, and the median nodule diameter was 1.60 cm. Malignant nodules were smaller than benign nodules (median diameter, 0.90 versus 2.40 cm), which is clinically relevant because smaller lesions are more susceptible to partial-volume effects and boundary ambiguity.

Table 1

Clinical characteristics

Characteristics	All (n=500)	Benign (n=250)	Malignant (n=250)	P value
Age (years)	48.00 [38.00–54.00]	51.00 [44.00–56.00]	43.00 [34.00–50.00]	<0.001
Gender				0.22
Male	127 (25.40)	57 (22.80)	70 (28.00)
Female	373 (74.60)	193 (77.20)	180 (72.00)
Position				0.66
Right	286 (57.20)	148 (59.20)	138 (55.20)
Left	203 (40.60)	97 (38.80)	106 (42.40)
Isthmus	11 (2.20)	5 (2.00)	6 (2.40)
Diameter (cm)	1.60 [0.80–2.60]	2.40 [1.70–3.10]	0.90 [0.60–1.48]	<0.001

Baseline characteristics of the enrolled patients data are presented as median [range] or n (%). P values are for comparisons between the benign and malignant nodule groups.

Quantitative comparison with mainstream models

As shown in Table 2, CA-HRNet + TTA achieved the best segmentation performance, with an overall DC of 78.6% and an IoU of 70.0% on the internal test set. TTA improved the DC for all tested models by 0.8%, 0.8%, 2.9%, 1.3%, 1.1%, and 1.5% for U-Net, SegFormer, TransUNet, DAC-Net, HRNet, and CA-HRNet, respectively. CA-HRNet + TTA outperformed HRNet + TTA, indicating the added value of the CFSM and channel attention design. The following visual comparisons further demonstrated performance differences between CA-HRNet and the competing models.

Table 2

Experimental results of thyroid nodule segmentation

Model	Dice coefficient (%)			IoU (%)
Model	Benign	Malignant	All	Benign	Malignant	All
U-Net	83.1	62.3	75.7	74.8	52.3	66.7
U-Net + TTA	83.8	63.5	76.5	75.6	53.0	67.5
SegFormer	81.9	61.0	74.4	73.5	50.7	65.3
SegFormer + TTA	82.4	62.3	75.2	73.9	51.7	66.0
TransUNet	78.9	56.5	70.9	69.9	46.3	61.5
TransUNet + TTA	81.3	60.1	73.8	72.6	49.3	64.3
DAC-Net	82.3	59.3	74.0	73.5	49.4	64.9
DAC-Net + TTA	82.9	61.7	75.3	74.3	51.5	66.2
HRNet	84.2	62.8	76.6	76.4	52.9	68.0
HRNet + TTA	84.5	65.5	77.7	76.7	54.8	68.9
CA-HRNet	84.4	64.0	77.1	76.6	54.5	68.7
CA-HRNet + TTA	85.5^†	66.2^†	78.6^†	77.9^†	56.0^†	70.0^†

Performance comparison of different models on the thyroid nodule segmentation task. TTA indicates the application of test-time augmentation. ^† indicates that the corresponding model achieves the best performance. CA-HRNet, Channel Attention High-Resolution Network; HRNet, High-Resolution Network; IoU, intersection over union; TransUNet, Transformer U-Net.

Visual comparison and analysis

To intuitively demonstrate the segmentation performance of different models, representative cases from the test set were selected for visual comparison. The following figures present side-by-side comparisons of the segmentation results generated by U-Net, SegFormer, TransUNet, DAC-Net, HRNet, and CA-HRNet, alongside the corresponding ground truth annotations. Cases 1–4 are malignant nodules, characterized by irregular shapes and blurred boundaries, while Cases 5–8 are benign nodules, typically exhibiting clearer margins and more regular morphology.

U-Net performs adequately on nodules with well-defined boundaries but tends to under-segment or lose fine details in regions with low contrast or complex morphology (Figure 5). U-Net + TTA enhances the model’s robustness and delineates more complete lesion contours with improved boundary adherence. It also corrects some instances where U-Net misclassified normal tissue as a lesion. However, U-Net + TTA can introduce new errors, such as mis-segmenting contralateral normal thyroid tissue, intravascular contrast heterogeneity, or other anatomical structures as lesions, and the incidence of such errors is notably higher than in other models. A prominent example is Case 7, where U-Net + TTA was the only model among those utilizing TTA that extensively misidentified contralateral vascular structures as part of the lesion.

Figure 5 Comparison among ground truth, U-Net, and U-Net + TTA. Each column represents one case, displaying from top to bottom: the ground truth manual annotation (red contour), the segmentation result from U-Net (green contour), and the refined output from U-Net with TTA (green contour). TTA, test-time augmentation.

The SegFormer, leveraging its Transformer-based architecture, demonstrates strong long-range dependency modeling and effectively captures the overall region of nodules, even in complex backgrounds (Figure 6). However, it shows limited precision in fine boundary alignment and preservation of small structures, especially on medical imaging datasets of limited size. Its inference speed is also relatively slow. SegFormer + TTA enhances recognition consistency and reduces over-segmentation of normal thyroid or adjacent tissues to some extent. Nevertheless, it may still include extraneous normal thyroid tissue, particularly when segmenting small nodules or those with highly blurred margins.

Figure 6 Comparison among ground truth, SegFormer, and SegFormer + TTA. The layout follows Figure 5, presenting ground truth, SegFormer, and SegFormer + TTA results. TTA, test-time augmentation.

The TransUNet, which aims to integrate the global context modeling of Transformers with U-Net’s local detail extraction, often exhibits noticeable fragmentation or over-smoothing in nodules with blurred or irregular margins (Figure 7). It frequently over-segments adjacent normal thyroid tissue or fails to capture the complete lesion boundary. TransUNet + TTA significantly mitigates these issues, leading to more coherent segmentations. However, this improvement comes at the cost of an increased tendency to misclassify normal thyroid tissue as lesion. While the severity of these errors is less than those sometimes seen with U-Net, their frequency of occurrence is higher.

Figure 7 Comparison among ground truth, TransUNet, and TransUNet + TTA. The layout follows Figure 5, comparing ground truth, TransUNet, and TransUNet + TTA. TTA, test-time augmentation; TransUNet, Transformer U-Net.

Similar to U-Net, the DAC-Net may occasionally mis-segment adjacent vessels as lesions or fail to include parts of a lesion in areas of ambiguous boundary transition. DAC-Net + TTA effectively alleviates many of these inaccuracies (Figure 8). Importantly, the segmentation errors produced by DAC-Net, both with and without TTA, are typically minor and predominantly confined to boundary imperfections, rarely resulting in large-scale deviations from the true lesion area.

Figure 8 Comparison among ground truth, DAC-Net, and DAC-Net + TTA. The layout follows Figure 5, showing ground truth, DAC-Net, and DAC-Net + TTA. TTA, test-time augmentation.

HRNet maintains high-resolution feature representations through its parallel multi-branch architecture, providing a strong advantage for capturing fine details in medical image segmentation (Figure 9). The baseline model performs robustly on both benign and malignant nodules. HRNet + TTA offers further refinement, improving segmentation efficiency and consistency. In numerous cases, the results from HRNet + TTA are largely congruent with manual annotations. The primary remaining challenge lies in accurately delineating nodules with extremely faint or poorly defined edges.

Figure 9 Comparison among ground truth, HRNet, and HRNet + TTA. The layout follows Figure 5, presenting ground truth, HRNet, and HRNet + TTA. HRNet, High-Resolution Network; TTA, test-time augmentation.

CA-HRNet integrates the multi-resolution feature preservation of HRNet with a channel attention mechanism (Figure 10). The CA-HRNet achieves high boundary adherence for benign nodules and maintains reasonable structural consistency for malignant ones despite their challenging morphology. It delivers the best overall segmentation performance among the compared models, particularly in detail retention and malignant nodule recognition. CA-HRNet + TTA further refines the results, yielding excellent outcomes even for difficult malignant cases. Isolated errors may occur, such as the occasional misclassification of normal thyroid tissue as lesion, but these are infrequent.

Figure 10 Comparison among ground truth, CA-HRNet, and CA-HRNet + TTA. The layout follows Figure 5, comparing ground truth, CA-HRNet, and CA-HRNet + TTA. CA-HRNet, Channel Attention High-Resolution Network; TTA, test-time augmentation.

Visual analysis confirms that the proposed CA-HRNet produces segmentation results most closely aligned with ground truth across most cases. It outperforms the comparative models in preserving edge details, detecting small targets, and handling irregularly shaped regions. This performance advantage suggests that the integrated channel attention mechanism effectively enhances the model’s ability to perceive and reconstruct critical anatomical features.

Ablation study and analysis

To thoroughly evaluate the individual and synergistic contributions of the proposed components, we conducted an ablation study focusing on the CFSM and the CACM. The CFSM introduced a novel dynamic channel selection paradigm at the feature fusion stage, which is directly compared against the conventional channel weighting paradigm represented by the SE module. Performance was assessed using the DC, IoU, and computational complexity measured in multiply-accumulate operations (MACs).

As shown in Table 3, the ablation study revealed three main findings. First, CFSM achieved better benign nodule segmentation than the conventional SE module (85.4% versus 84.8% DC), whereas SE showed a marginal advantage for malignant nodules (66.1% versus 65.2%). Second, combining CFSM with CACM produced the best overall segmentation performance (78.6% DC and 70.0% IoU), suggesting that dynamic channel selection and attention-based refinement are complementary. Third, this configuration reduced computational complexity to 25.376 G MACs, representing a 73% decrease compared with the CFSM-only configuration. This accuracy-efficiency balance is important for future implementation in clinical image-analysis workflows, particularly when computational resources are limited.

Table 3

Results of the CA-HRNet architecture under different module configurations

Setting	Dice coefficient (%)			IoU (%)			MACs (G)
Setting	Benign	Malignant	All	Benign	Malignant	All	MACs (G)
CFSM	85.4	65.2	78.2	77.7	54.9	69.6	93.764
CFSM + CACM	85.5^†	66.2^†	78.6^†	77.9^†	56.0^†	70.0^†	25.376^†
SE	84.8	66.1	78.1	77.1	55.7	69.5	87.364
SE + CACM	85.5	65.6	78.4	77.8	55.3	69.7	31.776

Ablation study results of the CFSM and the CACM within CA-HRNet. ^† indicates that the corresponding model achieves the best performance. CA-HRNet, Channel Attention High-Resolution Network; CACM, Channel Attention Convolution Module; CFSM, Channel Feature Selection Module; IoU, intersection over union; MACs (G), multiply-accumulate operations (in Giga); SE, squeeze-and-excitation module (representing the channel weighting paradigm).

Discussion

This study developed and internally validated CA-HRNet for automatic thyroid nodule segmentation on contrast-enhanced CT images from 500 pathologically confirmed patients. The main clinical motivation was to reduce dependence on manual ROI delineation, a time-consuming and subjective step that can affect downstream radiomics and computer-aided diagnosis. On the internal test set, CA-HRNet + TTA achieved the highest overall DC and IoU among the evaluated models, supporting its ability to produce reproducible CT lesion masks.

The technical contribution of CA-HRNet is the integration of HRNet’s high-resolution feature preservation with channel attention and dynamic feature selection. Unlike approaches that only increase model complexity, the CFSM-CACM design reduced redundant feature channels while maintaining or improving segmentation accuracy. This is clinically relevant because image-analysis tools must be sufficiently efficient to be integrated into routine workflows and scalable radiomics pipelines.

The lower DC for malignant nodules (66.2%) than for benign nodules (85.5%) is clinically meaningful. This performance gap likely stems from the inherent and often more complex imaging phenotypes of malignant thyroid nodules. Specifically, malignant nodules frequently present with ill-defined or infiltrative margins, irregular and spiculated contours, and heterogeneous internal architecture—features that pose significant challenges to automated segmentation systems. Beyond shape irregularity, malignant lesions often exhibit poor contrast differentiation from adjacent thyroid parenchyma or strap muscles, complicating boundary delineation. Furthermore, internal heterogeneity—manifesting as areas of cystic degeneration, necrosis, hemorrhage, or microcalcifications—introduces additional variability in texture and intensity, which can confuse intensity-based segmentation algorithms. These characteristics not only challenge human interpreters but also limit the ability of conventional convolutional operators to capture such irregular and high-frequency structural details. In addition, the relatively smaller size of many malignant nodules in our dataset (median diameter 0.9 cm) compared to benign ones (median diameter 2.4 cm) may further exacerbate segmentation difficulty, as small objects are inherently more susceptible to partial-volume effects and require finer spatial sensitivity. Future work should therefore focus on developing more sophisticated segmentation strategies tailored to these challenging cases, including boundary-aware loss functions, texture-adaptive feature extraction, and strategies that explicitly focus on ambiguous margins.

The strengths of this study include a relatively large pathologically confirmed CT cohort, balanced representation of benign and malignant nodules, patient-level division of data, and comparison with several widely used segmentation architectures. The subgroup analysis by pathological status was used to characterize segmentation difficulty across clinically distinct nodule types. Importantly, the model should be interpreted as a segmentation tool that may facilitate downstream diagnostic modeling, not as a stand-alone benign-malignant classifier.

Several limitations should be acknowledged. First, all data were obtained from a single institution using a consistent scanner and imaging protocol, so the internal test set cannot substitute for independent external validation. Second, although pathology was used as the reference standard for benign or malignant status, the present model did not output a diagnostic probability; therefore, AUC, sensitivity, specificity, and calibration metrics for malignancy prediction were not applicable. Third, the study focused on CT alone and did not integrate ultrasound, clinical laboratory data, molecular markers, or genetic profiles. Future multicenter and prospective studies should evaluate generalizability across scanners and institutions and determine whether CA-HRNet-derived masks improve radiomics-based diagnosis, prognosis prediction, or treatment planning.

Conclusions

This study constructed a pathologically confirmed CT dataset of thyroid nodules and proposed CA-HRNet, an automated segmentation model incorporating channel attention and feature selection. Experimental results demonstrated that CA-HRNet achieves accurate and computationally efficient automated segmentation of thyroid nodules, outperforming several state-of-the-art models. These findings support its potential as a reproducible ROI-generation tool for CT-based radiomics and computer-aided diagnosis, but external validation and downstream clinical task testing are needed before routine clinical use.

Acknowledgments

The authors thank the patients and clinical staff. During the preparation of this work, the authors used DeepSeek to enhance the clarity and quality of the writing. After using these tools, the authors reviewed and edited the content as needed and take full responsibility for the published article.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0162/rc

Data Sharing Statement: Available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0162/dss

Peer Review File: Available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0162/prf

Funding: This work was supported by the National Natural Science Foundation of China (General Program) (No. 82171908); the Competitive Project of Quzhou Municipal Bureau of Science and Technology (No. 2024K078); the Project of Quzhou Municipal Bureau of Science and Technology (No. 2022K65); the Project of Quzhou Municipal Bureau of Science and Technology (No. 2022020); the Scientific Research Start-up Project of Quzhou University (No. BSYJ202347); the Project of Quzhou Municipal Bureau of Science and Technology (No. 2024K168); and Zhejiang Provincial Natural Science Foundation of China under grant (No. LQN26F030045).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0162/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of Quzhou People’s Hospital (approval No. 2025-101) and informed consent was taken from all individual participants.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Pizzato M, Li M, Vignat J, et al. The epidemiological landscape of thyroid cancer worldwide: GLOBOCAN estimates for incidence and mortality rates in 2020. Lancet Diabetes Endocrinol 2022;10:264-72. [Crossref] [PubMed]
Li M, Dal Maso L, Pizzato M, et al. Thyroid cancer in adolescents and young adults: a population-based study in 185 countries worldwide. Lancet Diabetes Endocrinol 2026;14:112-22. [Crossref] [PubMed]
Tessler FN, Middleton WD, Grant EG, et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J Am Coll Radiol 2017;14:587-595. [Crossref] [PubMed]
Ringel MD, Sosa JA, Baloch Z, et al. 2025 American Thyroid Association Management Guidelines for Adult Patients with Differentiated Thyroid Cancer. Thyroid 2025;35:841-985. [Crossref] [PubMed]
Li G, Chen R, Zhang J, et al. Fusing enhanced Transformer and large kernel CNN for malignant thyroid nodule segmentation. Biomedical Signal Processing and Control 2023;83:104636.
Alexander EK, Cibas ES. Diagnosis of thyroid nodules. Lancet Diabetes Endocrinol 2022;10:533-9. [Crossref] [PubMed]
Boers T, Braak SJ, Rikken NET, et al. Ultrasound imaging in thyroid nodule diagnosis, therapy, and follow-up: Current status and future trends. J Clin Ultrasound 2023;51:1087-100. [Crossref] [PubMed]
Xie Y, Yang Z, Yang Q, et al. Identification method of thyroid nodule ultrasonography based on self-supervised learning dual-branch attention learning framework. Health Inf Sci Syst 2024;12:7. [Crossref] [PubMed]
Chen L, Chen L, Liu J, et al. Value of Qualitative and Quantitative Contrast-Enhanced Ultrasound Analysis in Preoperative Diagnosis of Cervical Lymph Node Metastasis From Papillary Thyroid Carcinoma. J Ultrasound Med 2020;39:73-81. [Crossref] [PubMed]
Abdolali F, Kapur J, Jaremko JL, et al. Automated thyroid nodule detection from ultrasound imaging using deep convolutional neural networks. Comput Biol Med 2020;122:103871. [Crossref] [PubMed]
Xiong S, Fu Z, Deng Z, et al. Machine learning-based CT radiomics enhances bladder cancer staging predictions: A comparative study of clinical, radiomics, and combined models. Med Phys 2024;51:5965-77. [Crossref] [PubMed]
O'Sullivan NJ, Temperley HC, Horan MT, et al. Computed tomography (CT) derived radiomics to predict post-operative disease recurrence in gastric cancer; a systematic review and meta-analysis. Curr Probl Diagn Radiol 2024;53:717-22. [Crossref] [PubMed]
Barry N, Kendrick J, Molin K, et al. Evaluating the impact of the Radiomics Quality Score: a systematic review and meta-analysis. Eur Radiol 2025;35:1701-13. [Crossref] [PubMed]
Lambin P, Rios-Velazquez E, Leijenaar R, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012;48:441-6. [Crossref] [PubMed]
Cao Y, Zhong X, Diao W, et al. Radiomics in Differentiated Thyroid Cancer and Nodules: Explorations, Application, and Limitations. Cancers (Basel) 2021;13:2436. [Crossref] [PubMed]
Zhao Q, Guo S, Zhang Y, et al. Multimodal ultrasound radiomics model combined with clinical model for differentiating follicular thyroid adenoma from carcinoma. BMC Med Imaging 2025;25:152. [Crossref] [PubMed]
Lin S, Gao M, Yang Z, et al. CT-Based Radiomics Models for Differentiation of Benign and Malignant Thyroid Nodules: A Multicenter Development and Validation Study. AJR Am J Roentgenol 2024;223:e2431077. [Crossref] [PubMed]
Du J, He X, Fan R, et al. Artificial intelligence-assisted precise preoperative prediction of lateral cervical lymph nodes metastasis in papillary thyroid carcinoma via a clinical-CT radiomic combined model. Int J Surg 2025;111:2453-66. [Crossref] [PubMed]
Gao X, Ran X, Ding W. The progress of radiomics in thyroid nodules. Front Oncol 2023;13:1109319. [Crossref] [PubMed]
Chen C, Mat Isa NA, Liu X. A review of convolutional neural network based methods for medical image classification. Comput Biol Med 2025;185:109507. [Crossref] [PubMed]
Rayed MdE. Deep learning for medical image segmentation: State-of-the-art advancements and challenges. Informatics in Medicine Unlocked 2024;47:101504.
Kendrick J, Francis RJ, Hassan GM, et al. Fully automatic prognostic biomarker extraction from metastatic prostate lesion segmentations in whole-body (68)GaGa-PSMA-11 PET/CT images. Eur J Nucl Med Mol Imaging 2022;50:67-79. [Crossref] [PubMed]
Kang W, Qiu X, Luo Y, et al. Application of radiomics-based multiomics combinations in the tumor microenvironment and cancer prognosis. J Transl Med 2023;21:598. [Crossref] [PubMed]
Maroulis DE, Savelonas MA, Iakovidis DK, et al. Variable background active contour model for computer-aided delineation of nodules in thyroid ultrasound images. IEEE Trans Inf Technol Biomed 2007;11:537-43. [Crossref] [PubMed]
Savelonas MA, Iakovidis DK, Legakis I, et al. Active contours guided by echogenicity and texture for delineation of thyroid nodules in ultrasound images. IEEE Trans Inf Technol Biomed 2009;13:519-27. [Crossref] [PubMed]
Li Z, Liu C, Liu G, et al. A novel statistical image thresholding method. AEU - International Journal of Electronics and Communications 2010;64:1137-47.
Du W, Sang N. An effective segmentation method of ultrasonic thyroid nodules. In: Liu J, editor. Proc. SPIE 9814, MIPPR 2015: Parallel Processing of Images and Optimization; and Medical Imaging Processing. Enshi, China; 2015:98140F.
Bi L, Shuang Z. Diagnosis of Thyroid Nodules Based on Local Non-quantitative Multi-Directional Texture Descriptor with Rotation Invariant Characteristics for Ultrasound Image. J Med Syst 2019;43:231. [Crossref] [PubMed]
Benabdallah FZ, Djerou L. Active Contour Extension Basing on Haralick Texture Features, Multi-gene Genetic Programming, and Block Matching to Segment Thyroid in 3D Ultrasound Images. Arab J Sci Eng 2023;48:2429-40.
Peng B, Lin W, Zhou W, et al. Enhanced pediatric thyroid ultrasound image segmentation using DC-Contrast U-Net. BMC Med Imaging 2024;24:275. [Crossref] [PubMed]
Xiang Z, Tian X, Liu Y, et al. Federated learning via multi-attention guided UNet for thyroid nodule segmentation of ultrasound images. Neural Netw 2025;181:106754. [Crossref] [PubMed]
Dong P, Zhang R, Li J, et al. An ultrasound image segmentation method for thyroid nodules based on dual-path attention mechanism-enhanced UNet+. BMC Med Imaging 2024;24:341. [Crossref] [PubMed]
Wang S, Liu Z, Shi G, et al. MFS-Unet: A Multi-Path Vision Mamba Network for Precise Thyroid Nodule Segmentation. IET Syst Biol 2026;20:e70044. [Crossref] [PubMed]
Ye D, Lan K, Cheng J, et al. MFA-Net: multi-scale feature aggregation network with background-aware module for ultrasound segmentation of thyroid nodules. Quant Imaging Med Surg 2025;15:12167-89. [Crossref] [PubMed]
Sun X, Li X, Yang Z, et al. RTS-Net: thyroid nodule segmentation network integrating dual-path attention and graph convolution. Front Med (Lausanne) 2026;13:1785796. [Crossref] [PubMed]
Liu C, Chen S, Yang Y, et al. The value of the computer-aided diagnosis system for thyroid lesions based on computed tomography images. Quant Imaging Med Surg 2019;9:642-53. [Crossref] [PubMed]
Peng W, Liu C, Xia S, et al. Thyroid nodule recognition in computed tomography using first order statistics. Biomed Eng Online 2017;16:67. [Crossref] [PubMed]
Zhao Z, Ye C, Hu Y, et al. Cascade and Fusion of Multitask Convolutional Neural Networks for Detection of Thyroid Nodules in Contrast-Enhanced CT. Comput Intell Neurosci 2019;2019:7401235. [Crossref] [PubMed]
Li W, Cheng S, Qian K, et al. Automatic Recognition and Classification System of Thyroid Nodules in CT Images Based on CNN. Comput Intell Neurosci 2021;2021:5540186. [Crossref] [PubMed]
Wan J, Liu L, Wang H, et al. UNSX-HRNet: Modeling anatomical uncertainty for landmark detection in total hip arthroplasty. Comput Biol Med 2025;198:111146. [Crossref] [PubMed]
Hu J, Shen L, Albanie S, et al. Squeeze-and-Excitation Networks. IEEE Trans Pattern Anal Mach Intell 2020;42:2011-23. [Crossref] [PubMed]
Zhu YC, Jin PF, Bao J, et al. Thyroid ultrasound image classification using a convolutional neural network. Ann Transl Med 2021;9:1526. [Crossref] [PubMed]
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, editors. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Springer International Publishing; 2015:234-41.
Zheng Y, Xie J, Sain A, et al. Sketch-Segformer: Transformer-Based Segmentation for Figurative and Creative Sketches. IEEE Trans Image Process 2023;32:4595-609. [Crossref] [PubMed]
Chen J, Mei J, Li X, et al. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med Image Anal 2024;97:103280. [Crossref] [PubMed]
Yang Y, Huang H, Shao Y, et al. DAC-Net: A light-weight U-shaped network based efficient convolution and attention for thyroid nodule segmentation. Comput Biol Med 2024;180:108972. [Crossref] [PubMed]
Liu L, Jiang H, He P, et al. On the Variance of the Adaptive Learning Rate and Beyond. arXiv:1908.03265 [Preprint]. 2019 [cited 2026 Jan 29]. Available online: https://arxiv.org/abs/1908.03265

Cite this article as: Li N, Jiang M, Huang Y, Zhang G, Xu S, Huang W, Han X, Zhu X. Development and validation of an Automated computed tomography segmentation model for thyroid nodules using a channel attention high-resolution network. Gland Surg 2026;15(6):157. doi: 10.21037/gs-2026-0162

Development and validation of an Automated computed tomography segmentation model for thyroid nodules using a channel attention high-resolution network

Highlight box

Introduction

Methods

Dataset description

Model architecture

Statistical analysis

Results

Table 1

Quantitative comparison with mainstream models

Table 2

Visual comparison and analysis

Ablation study and analysis

Table 3

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share