Integrating multimodal ultrasound imaging for improved radiomics sentinel lymph node assessment in breast cancer
Highlight box
Key findings
• This study developed and validated a deep learning radiomics model that integrates B-mode ultrasound (BMUS) and color Doppler ultrasound (CDUS) imaging to noninvasively predict sentinel lymph node (SLN) metastasis in breast cancer (BC) patients. The combined model (COMB), which incorporates both handcrafted and deep features, demonstrated the highest performance among all models, achieving an AUC of 0.837 in the test cohort and a high negative predictive value, making it a reliable tool to identify patients with disease-free axilla.
What is known and what is new?
• It is known that SLN biopsy is the standard for nodal staging but is invasive and may not always be necessary, particularly in patients with clinically and radiologically node-negative disease.
• Prior ultrasound-based radiomics studies have explored SLN prediction, but few have leveraged multimodal imaging and deep learning fusion. This study introduces a novel dual-branch deep learning architecture (USCD-Net) and shows that combining BMUS and CDUS features enhances predictive performance and model interpretability.
What is the implication, and what should change now?
• The COMB model may help clinicians safely identify patients with negative SLN status and reduce unnecessary invasive procedures. This noninvasive, artificial intelligence-assisted approach supports the clinical shift toward personalized, less invasive axillary management. Prospective validation and integration into preoperative workflows are warranted to further advance surgical decision-making in BC care.
Introduction
Breast cancer (BC) is the top malignancy in women. The 5-year survival is 98% in patients with stage I but it decreases to 84.4% in patients with axillary lymph node metastasis (1). Accurate assessment of the nodal status is therefore crucial. For clinically node-negative patients, sentinel lymph node (SLN) evaluation remains the gold standard for evaluating SLN metastasis. However, SLN biopsy (SLNB) entails certain disadvantages such as cost and accessibility issues, as it requires specialized techniques and expertise which could be unavailable in resource-limited settings (2). Additionally, it can result in a low incidence of lymphedema, causing swelling and discomfort in the affected limb (3). In addition, there is emerging evidence that omission of SLNB in BC patients, with clinically and radiologically negative axilla, does not affect survival (4). Hence, there is a need to develop a tool which could diagnose SLN metastasis with high accuracy, efficiency, and precision with minimal invasiveness.
Ultrasound (US) is critical for detecting BC and predicting lymph node metastasis (4). Various US characteristics such as tumor proximity to the skin and nipple, lymphatic invasion, and tumor size have been linked to lymph node involvement (5). However, axillary US has shown limited diagnostic accuracy for lymph node metastasis, with sensitivity varying between 48.8% and 87.1% and specificity ranging from 55.6% to 97.3% (6). Particularly, axillary ultrasound is less sensitive for nodal metastasis in patients with specific histology such as invasive lobular cancer (7).
As a result, several adjunctive techniques for enhancing US performance have been developed. Among these, color Doppler ultrasound (CDUS) has emerged as a clinical tool for enhancing the diagnostic accuracy of breast ultrasound by providing hemodynamic information (8). Specifically, it facilitates the clinical evaluation of blood flow velocity and the extent of vascularity within and surrounding breast lesions (BLs). It can also predict SLN metastasis by detecting microangiogenesis in axillary metastatic lymph nodes, which is stimulated by tumor-related angiogenic factors (9,10). CDUS is particularly useful for identifying abnormal peripheral blood flow that is typically observed in metastatic lymph nodes. However, both metastatic and reactive lymph nodes may exhibit diffuse hyperemia originating from the hilum (11).
Advancements in artificial intelligence (AI) have revolutionized medical image analysis, particularly in BL US imaging (12). One of the most promising developments is deep learning radiomics, a novel approach that leverages on supervised machine learning to extract high-throughput quantitative features from medical images, offering improved accuracy and efficiency in diagnostics (13,14). Combining US with computer-aided methods represents a potential solution to overcoming challenges such as operator variability and subjective interpretation. However, there remains a paucity of research on the combined effect of these techniques on its diagnostic performance. Hence, our study aimed to assess the added value of integrating multimodal US images including CDUS and B-mode ultrasound (BMUS) to improve the diagnosis of SLN status in patients with BC through the use of radiomics analysis. We present this article in accordance with the TRIPOD reporting checklist (available at https://gs.amegroups.com/article/view/10.21037/gs-2025-223/rc).
Methods
Patients
This retrospective study was approved by the ethics committee of Shanghai Pudong New Area People’s Hospital (No. prylz2020-082). Shuguang Hospital Affiliated to Shanghai University of Chinese Traditional Medicine was informed of the study and agreed to the use of its data. Written consent from patients was waived due to the retrospective nature of the study. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The inclusion criteria were as follows: (I) patients who underwent BMUS and CDUS prior to surgery between October 2021 and March 2025 in the Shanghai Pudong New Area People’s Hospital and Shuguang Hospital Affiliated to Shanghai University of Chinese Traditional Medicine; and (II) patients with unilateral and unifocal BC who underwent SLNB with a confirmed pathological diagnosis. SLNB was conducted based on ASCO Guideline (15). The following exclusion criteria were applied: (I) administration of preoperative BC treatments, such as excision biopsy, neoadjuvant radiotherapy, or chemotherapy; (II) a history of other malignancies; (III) a history of allogeneic hematopoietic stem cell or solid organ transplantation; and (IV) no data on lymph node findings.
Clinicopathologic data, including age, histologic type, human epidermal growth factor receptor 2 (HER2) status, estrogen receptor (ER) status, progesterone receptor (PR) status, tumor proliferation rate (Ki-67 levels), and metastatic SLN count [disease-free axilla (N0), a low metastatic burden of axillary disease (N1–2), and a high metastatic burden (N ≥3)], were retrieved from patient records. A cutoff value of 20% indicating Ki-67 positivity (16).
BMUS and CDUS examinations
Five radiologists, each with 2 to 18 years of breast BMUS experience, respectively, conducted preoperative breast and axillary US examinations. Examinations were conducted using standard departmental practice across the following ultrasound machines: EPIQ 7 (Philips Healthcare, Best, the Netherlands), ACUSON Oxana 2 (Siemens Healthineers, Erlangen, Germany), and LOGIQ E9 (GE HealthCare, Chicago, IL, USA) devices equipped with a 6- to 12-MHz transducer. Each patient was instructed to lie on their back with both hands positioned behind their head. This posture ensured full exposure of the breasts, as well as the bilateral axillary and supraclavicular fossae. BLs were examined first, followed by the axillary and supraclavicular lymph nodes. The same radiologists then performed CDUS to assess the morphology and blood flow signals in the BLs. As per routine practice, image acquisition settings were adjusted to optimize visualization while minimizing noise in the normal parenchyma. The CDUS amplification was optimized to reduce noise in the normal breast parenchyma, ensuring adequate visibility while operating just below the threshold of random noise. The imaging protocol incorporated a medium wall filter, a 700-Hz pulse repetition frequency, intermediate persistence for image stability, a per-second frame rate of 16 frames, and a 50-dB dynamic range, all contributing to achieving high image quality. All the exams were conducted 1–2 weeks prior to surgery.
Data preprocessing
Region of interest (RoI) delineation
The process of accurately recognizing the RoI, specifically the lesion area of the US image, involved cropping it from the entire image. To facilitate this, a cropping software tool was developed in MATLAB (MathWorks, Natick, MA, USA) to assist radiologists in delineating lesion borders by identifying the pixel points on the border, which were then collected as coordinates. The initial RoI cropping was manually conducted by a radiologist with over 7 years of experience in breast US. Subsequently, a senior radiologist, with more than 20 years of US expertise, reviewed the cropped images to assess inter- and intraclass correlation coefficients (ICCs). An ICC value above 0.75 was deemed indicative of satisfactory reliability and reproducibility.
Data augmentation
Data augmentation techniques were applied during the image preprocessing stage due to the limited number of study samples. These techniques can increase the dataset size by up to 10 times through random geometric transformations, such as flipping, rotation, scaling, and shifting. All augmented images were resized to 224×224 pixels to ensure a standardized scale. This approach was demonstrated to mitigate overfitting and reduce the risk of memorizing specific details of training images (17). The preprocessing procedures were conducted using Python version 3.6.6 (Python Software Foundation, Wilmington, DE, USA).
Feature extraction, selection, and dimension reduction
Traditional feature extraction
Radiomic features were extracted from the RoIs via the PyRadiomics toolkit (version 3.0), resulting in 107 features across seven categories: first-order statistics, 2D shape descriptors, gray level co-occurrence matrix (GLCM), gray level run length matrix (GLRLM), gray level size zone matrix (GLSZM), gray level dependence matrix (GLDM), and neighboring gray tone difference matrix (NGTDM). First-order statistical features quantify pixel intensity distribution within the RoIs, capturing key metrics such as mean, median, entropy, energy, kurtosis, variance, and skewness. Morphological features characterize the geometric properties of the tumor, including minor and major axis lengths, surface-to-volume ratio, and sphericity. Texture features, derived from GLCM, GLRLM, GLSZM, GLDM, and NGTDM matrices, were computed using a bin width of 25 and evaluated at 0°, 45°, 90°, and 135° angles and distances of 1, 2, and 3 pixels, capturing multi-directional spatial patterns. Wavelet features were generated through discrete wavelet transform (DWT), decomposing images into low-low (LL), low-high (LH), high-low (HL), and high-high (HH) subbands, enabling multiscale analysis of texture and intensity patterns.
Feature selection and dimension reduction
In the feature selection process, our primary objective was to eliminate redundant features exhibiting high correlations. These correlations likely arose from the same underlying data distribution. Among the remaining features, we further focused on those that had stronger correlations with the target variable, as these were expected to have higher relevance for target identification. Therefore, a two-step methodology was implemented to accomplish the feature selection. In the first step of feature dimensionality reduction, Pearson correlation analysis was employed to assess the pairwise relationships among radiomic features. A predefined correlation threshold of 0.75 was applied to identify and eliminate highly correlated features. Features with Pearson correlation coefficients exceeding this threshold were considered redundant and subsequently removed to reduce feature redundancy and mitigate multicollinearity. This step was performed prior to further feature selection to ensure that the retained features were relatively independent, minimizing the risk of overfitting in subsequent modeling.
In the second step, least absolute shrinkage and selection operator (LASSO) regression was implemented to further refine the feature set. LASSO employs L1 regularization to shrink the coefficients of less informative features to zero, retaining only those with significant predictive power. The optimal regularization parameter was determined using 10-fold cross-validation, based on the minimum mean squared error. The final set of selected features consisted of those with non-zero coefficients under the optimal value. This approach effectively reduces the feature set while preserving those most relevant to predictive modeling.
The Radiomics score (RS) was constructed as a linear combination of the selected LASSO features and their respective coefficients using the following formula:
where represents the number of LASSO-selected features. Each coefficient reflects the relative importance of the corresponding feature in predicting SLN metastasis. A higher RS indicates a greater likelihood of metastasis. The RS was subsequently integrated into the dataset for further analysis and model development.
Furthermore, principal component analysis (PCA) was used to reduce the dimensions of the 2,048 deep transfer learning features while preserving essential information. The features were compressed to 256 dimensions, with a balance maintained between computational efficiency and feature expressiveness. This reduction minimized the risk of overfitting and enhanced model robustness by prioritizing the most significant features. By leveraging PCA, the streamlined deep transfer learning features enhanced the overall robustness and performance of the model in a diversity of data scenarios.
Moreover, to enhance model performance through feature-level fusion, we implemented a hybrid approach by concatenating the selected handcrafted radiomic features and the dimensionally reduced deep learning features. Prior to fusion, all features were individually standardized using z-score normalization to ensure comparability across different scales and distributions (18). The standardized features were then concatenated into a unified feature vector, which was subsequently used as input for model training. This preprocessing step mitigated potential scale inconsistencies between heterogeneous feature types and facilitated stable optimization during classifier learning.
Deep learning model construction
We propose a dual-stream architecture, named ultrasound and color Doppler network (USCD-Net), based on MobileNetV2 for simultaneous representation learning from two complementary imaging modalities: BMUS and CDUS (Figure 1). Each branch independently encodes features using a MobileNetV2 encoder. The fusion was performed at an intermediate layer (after the 7th block), where spatial feature maps from both branches were fused via element-wise averaging. This layer was empirically chosen to balance between low-level structural features (shallow layers) and high-level semantic representations (deep layers). Fusion at earlier layers may capture redundant edge-level features with less contextual information, while deeper fusion points may risk losing modality-specific spatial patterns:
Here, , where is the batch size, and , , denote the channel, height, and width dimensions respectively. Pixel-wise averaging enables lightweight, spatially aligned, and symmetric fusion of multimodal features, which is particularly effective in paired medical images like BMUS and CDUS.
After fusion, the combined feature map is passed through the remaining layers of MobileNetV2 (i.e., block 8 to block 18), followed by global average pooling (GAP) and a fully connected (FC) classification head for binary prediction:
where is the sigmoid activation, , are trainable weight matrices of FC layers.
To enforce consistency between modality-specific features, we introduce a feature alignment constraint via L2 loss computed prior to fusion:
where is the total number of elements in each feature map. The final loss combines binary classification loss () and alignment loss:
where is a balancing hyperparameter, empirically set to 0.1.
Each sample consists of two images from the same anatomical location but from different modalities: one BMUS image and one CDUS image.
To enforce the one-to-one correspondence, we build a PairedImageDataset, which traverses the same subdirectory structure (/Doppler/0/+/ Doppler/1/ vs. /ultrasound /0/+/ ultrasound /1/) in both US and Doppler roots, and sorts filenames to maintain ordering. This guarantees that Sample I’s US and Doppler images refer to the same clinical instance.
We trained the dual-branch MobileNetV2 network for binary classification using paired BMUS and CDUS images with shared labels. Feature extraction was conducted independently up to the 7th block, after which features were fused via element-wise averaging and passed through a shared classifier.
The model was optimized using the Adam optimizer with binary cross-entropy loss. To enhance modality alignment, an additional L2-based regularization term was applied before fusion. Training used early stopping based on validation area under the curve (AUC), and the best-performing model was retained.
Deep feature extraction
Using the PyTorch deep learning framework (19), we implemented a dual-branch transfer learning architecture called USCD-Net based on MobileNetV2 backbones (20). Each MobileNetV2 branch was initialized with weights pretrained on the ImageNet dataset (21). During training, all layers were fine-tuned in an end-to-end manner, allowing domain-specific adaptation to ultrasound imaging. This full-network fine-tuning approach was selected based on preliminary trials, which showed improved convergence and accuracy compared to partial fine-tuning (i.e., freezing early layers). To optimize performance for our binary classification task involving paired ultrasound modalities, we fine-tuned key hyperparameters, particularly the learning rate. The initial learning rate was set to 1×10−4 and scheduled using cosine annealing throughout training. Model training was carried out for 50 epochs with a batch size of 32, using the Adam optimizer and a weight decay of 1×10−5. These settings facilitated stable convergence and effective knowledge adaptation to our domain, allowing the fused features from both BMUS and CDUS inputs to contribute meaningfully to the final classification.
Statistical analysis
The clinicopathological differences between the training and test groups were assessed with the Mann-Whitney test or the t-test. Receiver operating characteristic (ROC) curves were generated to evaluate the performance of each model. Key metrics, including negative predictive value (NPV), positive predictive value (PPV), sensitivity, specificity, and accuracy, were calculated for both the training and test groups, and plotted on the same ROC space. Comparisons of the AUC were conducted with the DeLong method. Statistical significance was set at a P value <0.05. Statistical analyses were performed using SPSS 22.0 (IBM Corp., Armonk, NY, USA).
Results
Baseline characters
Our study included 539 women, each with one BL. After evaluation of the data, 450 women (mean age 52.28±8.36 years; age range 32–80 years) with 450 malignant BLs were selected for the final analysis (Figure 2). The participants were randomly assigned to three groups: a training set (276 patients), a validation set (105 patients), a test set (69 patients). Based on SLN or axillary lymph node (ALN) dissection results, 144 women were identified with a disease-free axilla (N0) (training: 70; validation: 36; test: 38), 124 women had a low metastatic burden of axillary disease (N1–2) (training: 111; validation: 47; test: 13), and 113 women had a high metastatic burden (N ≥3) (training: 95; validation: 22; test: 18). Patients from Shuguang Hospital Affiliated to Shanghai University of Chinese Traditional Medicine were set as an external validation set. Table 1 presents the patient characteristics, including age, US size, histologic type, ER, and PR status, Ki-67 index, and HER2 status. No significant differences were found between the characteristics of the training and test sets (all P values >0.05).
Table 1
| Characteristic | All patients (n=450) |
Training (n=276) |
Validation (n=105) |
P value (train vs. valid) |
Test (n=69) | P value (train vs. test) |
|---|---|---|---|---|---|---|
| Age (years) | 52.28±8.36 | 52.54±8.27 | 51.78±8.48 | >0.990 | 51.25±8.68 | >0.99 |
| US size (mm) | 21.50±6.11 | 21.60±6.09 | 23.72±6.78 | >0.990 | 21.14±6.23 | 0.87 |
| SLN metastasis | 0.162 | 0.002 | ||||
| N0 | 144 | 70 | 36 | 38 | ||
| N1–2 | 171 | 111 | 47 | 13 | ||
| N≥3 | 135 | 95 | 22 | 18 | ||
| Histologic type | 0.803 | 0.79 | ||||
| Ductal | 73.91 | 205 | 74 | 50 | ||
| Other | 26.09 | 71 | 31 | 19 | ||
| ER | 0.874 | 0.59 | ||||
| Positive | 76.52 | 209 | 78 | 55 | ||
| Negative | 23.48 | 67 | 27 | 14 | ||
| PR | 0.037 | 0.21 | ||||
| Positive | 63.77 | 171 | 52 | 49 | ||
| Negative | 36.23 | 105 | 53 | 20 | ||
| HER2 | 0.024 | 0.55 | ||||
| Positive | 24.06 | 64 | 37 | 19 | ||
| Negative | 75.94 | 212 | 68 | 50 | ||
| Ki-67 | 0.857 | 0.64 | ||||
| Positive | 75.36 | 206 | 80 | 54 | ||
| Negative | 24.64 | 70 | 25 | 15 |
Data are presented as mean ± standard deviation or percentage or number of patients. ER, estrogen receptor; HER2, human epidermal growth factor receptor 2; PR, progesterone receptor; SLN, sentinel lymph node; US, ultrasound.
Feature selection
To evaluate the consistency of RoI annotations, two radiologists independently delineated the RoI regions in the images. Inter-observer agreement was assessed by calculating the ICC across the five observers, resulting in an ICC of 0.83±0.015. Additionally, intra-observer agreement was determined by having each radiologist reannotate the same RoI regions after a one-week interval, yielding an ICC of 0.85±0.025. A total of 105 handcrafted features were initially extracted from the original images. Following Pearson correlation analysis with a threshold of 0.70, the feature set was reduced to 22 handcrafted features, as illustrated in the heatmap (Figure 3A). Subsequently, LASSO regression was applied, with the optimal alpha value determined as 0.03199. This process further reduced the feature set to 6 features, comprising firstorder_Minimum, glcm_Contrast, glcm_Correlation, glcm_Imc2, glszm_SizeZoneNonUniformityNormalized, and ngtdm_Strength.
The Radiomics Score was calculated as a linear combination of these 6 selected features. The specific coefficients and clinical interpretations of each feature are summarized in Table 2. The LASSO cross-validation path (MSE) and coefficient path are visualized in Figure 3B,3C, respectively, demonstrating the selection process and stability of the chosen features. The final combined model (COMB) was based on 10 features, including 6 handcrafted features and 4 deep transfer learning features (Table 3).
Table 2
| Feature | Description | General interpretation of radiomic features in clinical | Coefficient | Coefficient-specific interpretation |
|---|---|---|---|---|
| firstorder_Minimum | The minimum intensity value of pixels within the ROI | Lower intensity may indicate hypo-echoic, dense, or necrotic tissue, potentially associated with aggressive lesions | 0.0037 | Slightly higher intensity values may suggest metastatic potential |
| glcm_Contrast | Intensity contrast between adjacent pixels | High contrast may indicate heterogeneous echotexture, often seen in metastatic lesions | −0.0739 | Lower contrast may be associated with less pronounced texture heterogeneity, potentially indicating metastasis |
| glcm_Correlation | Linear dependency of gray-level values on neighboring pixels | Higher correlation suggests more uniform texture, potentially indicative of structured metastatic tissue | 0.0194 | Increased correlation may correspond to organized structures in metastatic lesions |
| glcm_Imc2 | Complexity of texture based on shared pixel information | Higher IMC2 values may indicate more complex tissue structures, characteristic of invasive lesions | 0.0177 | Increased complexity may suggest metastatic patterns |
| glszm_SizeZoneNon-UniformityNormalized | Variability in size of homogeneous texture zones, normalized for ROI size | More varied texture zones can indicate heterogeneous structures, common in metastatic spread | 0.0429 | Higher non-uniformity in texture size zones may be predictive of metastasis |
| ngtdm_Strength | Coarseness of texture based on intensity differences | Coarser textures may indicate invasive or necrotic regions, associated with metastatic behavior | 0.0028 | Slight increase in coarseness may suggest metastatic potential |
LASSO, least absolute shrinkage and selection operator; ROI, region of interest.
Table 3
| Feature | Cohort | AUC | P |
|---|---|---|---|
| Local intensity statistical features (firstorder_Minimum) | Training | 0.693 | 0.04* |
| Validation | 0.619 | 0.03* | |
| Test | 0.541 | 0.07 | |
| Gray-level co-occurrence and dependency features (GLCM_Contrast + GLCM_Correlation + GLCM_IMC2) | Training | 0.627 | 0.04* |
| Validation | 0.582 | 0.04* | |
| Test | 0.537 | 0.045* | |
| Texture zone uniformity features (GLSZM_SizeZoneNonUniformityNormalized + NGTDM_Strength) | Training | 0.705 | 0.03* |
| Validation | 0.635 | 0.03* | |
| Test | 0.689 | 0.04* | |
| DL-94 | Training | 0.763 | 0.007* |
| Validation | 0.718 | 0.006* | |
| Test | 0.759 | 0.009* | |
| DL-168 | Training | 0.790 | 0.005* |
| Validation | 0.783 | 0.007* | |
| Test | 0.766 | 0.007* | |
| DL-209 | Training | 0.804 | <0.001* |
| Validation | 0.764 | <0.001* | |
| Test | 0.791 | <0.001* | |
| DL-499 | Training | 0.822 | <0.001* |
| Validation | 0.807 | <0.001* | |
| Test | 0.793 | <0.001* |
*, a statistically significant difference. AUC, area under the curve.
Backbone architecture comparison in USCD-Net
To assess the impact of different backbone architectures on the predictive performance of the USCD-Net framework, we conducted a comparative analysis using five widely adopted models: ResNet18, ResNet50, EfficientNetV2-S, Vision Transform (ViT-B/16), and MobileNetV2 (Table 4). In the binary classification task (N0 vs. N ≥1), MobileNetV2 consistently outperformed other backbonnes, achieving the highest AUC (0.837) and accuracy (ACC) (0.859) on the test set, followed closely by ResNet18 (AUC =0.822, ACC =0.834). While ViT-B/16 and EfficientNetV2-S showed competitive results (ViT-B/16: AUC =0.818, ACC =0.816; EfficientNetV2-S: AUC =0.806, ACC =0.812), ResNet50 showed slightly lower performance compared to ResNet18 and MobileNetV2.
Table 4
| Prediction task | Models | Fusion layer (PyTorch notation) | Training (95% CI) | Validation (95% CI) | Test (95% CI) | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| AUC | ACC | AUC | ACC | AUC | ACC | |||||
| (I) SLN metastasis prediction (N0 and N≥1) | ResNet18 | model.layer3 | 0.883 (0.87–0.90) | 0.846 (0.82–0.86) | 0.834 (0.81–0.85) | 0.815 (0.79–0.83) | 0.822 (0.80–0.84) | 0.834 (0.81–0.85) | ||
| ResNet50 | model.layer3 | 0.875 (0.85–0.89) | 0.823 (0.80–0.84) | 0.815 (0.79–0.83) | 0.803 (0.78–0.82) | 0.828 (0.80–0.84) | 0.814 (0.79–0.83) | |||
| EfficientNetV2-S | model.features[:5] | 0.866 (0.84–0.88) | 0.818 (0.79–0.83) | 0.794 (0.77–0.81) | 0.785 (0.76–0.80) | 0.806 (0.78–0.82) | 0.812 (0.79–0.83) | |||
| ViT (ViT-B/16) | model.encoder.layer[:6] | 0.876 (0.85–0.89) | 0.833 (0.81–0.85) | 0.825 (0.80–0.84) | 0.816 (0.79–0.83) | 0.818 (0.79–0.83) | 0.816 (0.79–0.83) | |||
| MobileNetV2 | model.features[:7] | 0.888 (0.79–0.95) | 0.855 (0.84–0.88) | 0.861 (0.79–0.92) | 0.864 (0.84–0.88) | 0.837 (0.82–0.86) | 0.859 (0.84–0.88) | |||
| (II) SLN status prediction (N0, N1–2, and N ≥3) | ResNet18 | model.layer3 | 0.942 (0.92–0.96) | 0.874 (0.86–0.89) | 0.727 (0.70–0.75) | 0.710 (0.68–0.73) | 0.734 (0.71–0.76) | 0.702 (0.67–0.73) | ||
| Resnet50 | model.layer3 | 0.898 (0.87–0.92) | 0.892 (0.87–0.90) | 0.694 (0.67–0.72) | 0.685 (0.66–0.70) | 0.728 (0.71–0.75) | 0.735 (0.70–0.75) | |||
| EfficientNetV2-S | model.features[:5] | 0.928 (0.91–0.95) | 0.889 (0.86–0.91) | 0.682 (0.66–0.70) | 0.729 (0.71–0.75) | 0.689 (0.67–0.71) | 0.674 (0.65–0.70) | |||
| ViT (ViT-B/16) | model.encoder.layer[:6] | 0.873 (0.85–0.89) | 0.854 (0.83–0.87) | 0.673 (0.65–0.70) | 0.682 (0.66–0.71) | 0.645 (0.62–0.67) | 0.651 (0.63–0.67) | |||
| MobileNetV2 | model.features[:7] | 0.895 (0.88–0.91) | 0.876 (0.86–0.89) | 0.739 (0.72–0.76) | 0.738 (0.72–0.75) | 0.763 (0.74–0.78) | 0.710 (0.69–0.73) | |||
ACC, accuracy; AUC, area under the curve; CI, confidence interval; N, node; SLN, sentinel lymph node; USCD-Net, ultrasound and color Doppler network.
In the three-class classification task (N0, N1–2, or N ≥3), MobileNetV2 again yielded the best overall test performance (AUC =0.763, ACC =0.710), indicating superior discriminative capability in differentiating SLN status. ResNet50 and ResNet18 followed, with ResNet50 achieving a test AUC of 0.728 and ACC of 0.735. Notably, while EfficientNetV2-S demonstrated high training performance (AUC =0.928), it showed a decline on the test set (AUC =0.689), suggesting potential overfitting. ViT-B/16 exhibited the lowest performance across both classification tasks in the testing stage.
Thus, MobileNetV2 was selected as the final backbone architecture in USCD-Net due to its consistent and superior performance across both classification tasks.
Prediction of SLN metastasis with N0 or N≥1
The COMB model demonstrated the highest predictive performance among all models, achieving an AUC of 0.888 (95% CI: 0.788–0.951) in the training set, 0.861 (95% CI: 0.793–0.922) in the validation set, and 0.837 (95% CI: 0.788–0.879) in the test set. The only handcrafted features (ONLY_HF) model, which was fully composed of handcrafted features, exhibited better predictive performance than did the deep feature-based model [only deep-learning features (ONLY_DF)] in both the training and test sets. It achieved AUC values of 0.792 (95% CI: 0.677–0.880) in the training set and 0.739 (95% CI: 0.683–0.789) in the test set (Table 5).
Table 5
| Modality | Cohort | AUC (95% CI) | Sensitivity | Specificity | Accuracy | PPV | NPV | F1 |
|---|---|---|---|---|---|---|---|---|
| ONLY_HF | Training | 0.792 (0.677–0.880) | 80.65% | 78.95% | 79.71% | 75.76% | 83.33% | 78.13% |
| Validation | 0.765 (0.690–0.830) | 73.81% | 76.19% | 75.00% | 77.78% | 72.00% | 75.77% | |
| Test | 0.739 (0.683–0.789) | 67.96% | 71.43% | 68.64% | 87.50% | 43.10% | 76.50% | |
| ONLY_DF | Training | 0.781 (0.666–0.872) | 74.20% | 76.30% | 75.36% | 71.86% | 78.38% | 73.01% |
| Validation | 0.748 (0.673–0.815) | 76.19% | 71.43% | 74.00% | 79.17% | 67.57% | 77.66% | |
| Test | 0.717 (0.660–0.770) | 78.64% | 70.00% | 76.45% | 88.52% | 52.69% | 83.29% | |
| COMB | Training | 0.888 (0.788–0.951) | 87.10% | 84.21% | 85.51% | 81.82% | 88.89% | 84.38% |
| Validation | 0.861 (0.793–0.922) | 84.13% | 78.57% | 86.40% | 85.71% | 76.60% | 84.91% | |
| Test | 0.837 (0.788–0.879) | 89.81% | 74.29% | 85.87% | 91.13% | 71.23% | 90.46% |
AUC, area under the curve; CI, confidence interval; COMB, combined both traditional and deep features via the USCD-Net; N, node; NPV, negative predictive value; ONLY_DF, only deep-learning features; ONLY_HF, only handcrafted features; PPV, positive predictive value; USCD-Net, ultrasound and color Doppler network.
Prediction of SLN metastasis with N0, N1–2, or N ≥3
The models were adapted to predict SLN status across the three task groups in the test set. Clinical endpoints were stratified into N0, N1–2, and N ≥3 categories. The overall accuracy of differentiating these categories with the ONLY_HF, ONLY_DF, and COMB models was 60.9%, 59.4%, and 71.0%, respectively (Figure 4). All three models struggled with differentiating between low and high metastatic burdens, as reflected in the lower performance for the low and high metastatic burden categories.
Interpretability of the deep learning image reconstruction model
Gradient-weighted class activation mapping was adopted to visually investigate the interpretability of the deep learning radiomics model. This method generated a coarse localization map that highlighted the critical regions contribution to classification (22). For SLN status prediction, the final convolutional layer in the last residual block was visualized to enhance transparency and interpretability (Figure 5). Two imaging features have demonstrated significant diagnostic value for predicting SLN metastasis: the lesion boundary and the lobulated areas.
In the class activation map (CAM) visualizations of non-metastatic lesions (bottom two rows), red activation regions are notably more localized, predominantly confined to the tumor interior or specific peripheral areas without a diffuse distribution pattern. Green and blue zones dominate the surrounding tissue, suggesting minimal model attention to the adjacent regions, further emphasizing the localized nature and well-defined boundaries typical of non-metastatic lesions (23).
Clinically, the localized activation may imply relatively low intratumoral heterogeneity, indicating more homogeneous structures, a characteristic often observed in benign or non-metastatic lesions. The clear demarcation of red activation along the tumor boundary, with minimal spillover into adjacent tissues, could align with well-circumscribed, less invasive lesions. Additionally, the predominance of blue and green areas suggests limited texture variation and uniform gray-level intensity, potentially reflecting lower cellular density and reduced aggressiveness.
Discussion
This study developed and validated three distinct models for determining the SLN metastasis status in patients with BC. These models included one based on handcrafted radiomic features extracted from BMUS, another using deep features derived from BMUS and CDUS, and a third that combined both feature types. Each model demonstrated high predictive performance, suggesting their potential for clinical application.
Previous studies employing radiomics, deep learning, or machine learning approaches have attested to their value in BC analysis, particularly for differentiating malignant from benign lesions (24,25). In this regard, US data have become a valuable source for radiomics analysis (26,27). Yu et al., using a cohort of 426 individuals, established a radiomics nomogram with US findings to predict ALN status in early-stage invasive BC. Their model demonstrated moderate accuracy, with AUC values of 0.78 and 0.71 for the training and validation sets, respectively (26). Our handcrafted radiomics feature-based model (ONLY_HF) achieved comparable performance, with AUCs of 0.792 and 0.739 in the training and test sets, respectively. Additionally, Yao et al. (28) further investigated the utility of US radiomics for determining SLN metastasis in early-stage invasive BC. Their study evaluated 278 patients and employed four machine learning classifiers, with support vector machine exhibiting the highest predictive performance (AUC =0.920). Their study underscored the potential of machine learning-based US radiomics, especially when combined with clinicopathological factors, as a tool for precisely applying clinical treatment strategies in BC. However, the radiomics pipeline relied heavily on manual processes, such as feature extraction and selection. Variations in these steps can impact prediction accuracy and model stability (23). Compared to the work by Yao et al. (28), our study offered a substantial advancement by integrating both deep learning-based and handcrafted features derived from multimodal imaging data. While their study was limited to handcrafted radiomic features extracted from BMUS images, our approach expanded the scope to include deep features from both BMUS and CDUS images, providing a more comprehensive and robust predictive model for SLN metastasis. Deep learning streamlined this process by automatically extracting predictive features and directly yielding class probability vectors, thereby reducing dependency on handcrafted pipelines and improving overall efficiency. The superior performance of deep learning in radiological studies has been well-documented. For instance, a comparative study showed that convolutional neural network outperformed radiomics, achieving a significantly higher AUC in classifying BC lesions as benign or malignant on magnetic resonance imaging (0.88 vs. 0.81; P<0.001) (29). Building on this evidence, we integrated deep features extracted from both conventional BMUS and CDUS to enhance predictive performance. This decision was motivated by the established correlation between tumor vascularity and lymph node metastasis. Microvessel formation within tumors, a hallmark of angiogenesis, has been shown to be strongly predictive of lymphatic spread. A Doppler sonography study has demonstrated that tumor vascularity as detected through power Doppler imaging correlated significantly with lymph node involvement (30). Notably, these findings were particularly pronounced in small tumors, emphasizing the value of vascular imaging in predicting lymph node metastasis (31). Cong et al. (30) further substantiated this by identifying ultrasonographic vascularity as an independent predictive factor for non-SLN involvement and tumor burden—findings consistent with earlier research. By leveraging on CDUS data, which captures vascular characteristics indicative of lymph node involvement, our models aimed to harness this rich source of predictive information. The inclusion of deep features from CDUS not only complemented the radiomic features derived from BMUS but also provided a more comprehensive representation of tumor biology, ultimately improving diagnostic accuracy in predicting SLN metastasis.
In our study, we assessed the effectiveness of models according to radiomic features, deep features, and their combination for predicting SLN metastasis. Our findings indicated that the combined model, COMB, achieved the highest diagnostic performance in predicting SLN status between the non-SLN group and SLN group. In the test set, COMB achieved an AUC of 0.837 (95% CI: 0.788–0.879), showing a significantly better performance as compared to the radiomics feature-based model (ONLY_BMUS) (P=0.031) and the deep feature-based model (ONLY_DF) (P=0.029).
In addition to binary classification tasks, our models were extended to predict SLN status across three categories: N0 (disease-free axilla), N1–2 (low axillary burden), and N ≥3 (high axillary burden). The overall accuracy of the combined COMB model in predicting these categories was 71.0% in the test cohort. Zheng et al. (32) created a deep learning radiomics model combining conventional US and shear wave elastography to predict ALN status preoperatively. Their model obtained an AUC of 0.905 (95% CI: 0.814–0.996) in discriminating between low and high metastatic burden in patients with early-stage BC. Their results indicated a superior performance as compared to that in our study, but this could be attributed to differences in sample size and in the US techniques employed. Our findings highlighted the challenges in distinguishing between patients with low and high metastatic burden, emphasizing the need for larger datasets and more advanced techniques in future studies.
Notably, the NPVs of the radiomics feature-based, deep feature-based, and combined models in the test cohort were 83.33%, 78.38%, and 88.89%, respectively. These results indicate the potential value of the models, particularly the combined COMB model, to assist clinicians in avoiding unnecessary invasive procedures. For patients predicted to have no SLN metastasis (N0), the high NPV of the combined model provides reassurance that SLN biopsy or ALN dissection may be safely avoided in a large proportion of cases (33). This approach was further strengthened by the COMB model’s robust performance in distinguishing patients without SLN metastasis (N0) from those with any level of SLN metastasis (N ≥1). Although all models showed limited accuracy in differentiating between low (N1–2) and high (N ≥3) metastatic burden, the COMB model’s high negative predictive value for N0 cases may help clinicians identify patients who could safely avoid unnecessary invasive procedures. As such, the model could still play a valuable role in guiding axillary treatment strategies, particularly for initial risk stratification.
Our findings also align with recent evidence from large prospective trials such as SOUND (4) and INSEMA (34), which support the de-escalation of axillary surgery in select early-stage breast cancer patients. These studies demonstrated that, in patient with clinically and sonographically negative axillae, omitting SLNB did not compromise oncologic outcomes. This growing body of evidence reinforces the clinical relevance of our study, as our COMB model, particularly due to its high negative predictive value, may aid in identifying node-negative patients who are candidates for avoiding SLNB. By providing a noninvasive and reproducible imaging-based risk stratification tool, our approach may help support clinical decision-making.
This study involved several limitations that should be addressed. Firstly, we employed a retrospective design, and thus selection bias could have been introduced. Secondly, patients with multifocal BLs or bilateral disease were not enrolled in this study, as it was challenging to determine which BL may have contributed to SLN metastasis. This exclusion may limit the model’s applicability to a broader range of clinical scenarios. Additionally, overfitting was observed in the three-class classification prediction, likely due to the high model complexity relative to the moderate dataset size. To address this, we will include larger, more diverse datasets and more advanced regularization techniques to fully realize their potential without compromising generalizability. To further improve prediction performance, we plan to refine the model by integrating additional imaging modalities and clinical parameters, allowing it to learn from a more extensive and heterogeneous dataset. Additionally, to improve the robustness and generalizability of the COMB model, we aim to acquire data from multiple medical institutions in future studies. This multicenter approach will help ensure that the model performs well across varying clinical and demographic settings, facilitating its integration into routine clinical practice.
Conclusions
In conclusion, this study developed and tested three models based on radiomic features, deep learning features, and their combination for predicting SLN metastasis in patients with BC. Among these, the combined model (COMB) demonstrated superior predictive performance, highlighting the potential of integrating BMUS and CDUS imaging with advanced deep learning techniques. The high NPV of the combined model supports its ability to assist clinicians in making informed decisions regarding axillary treatment, specifically for avoiding unnecessary invasive procedures for patients with no SLN metastasis. By addressing the limitations of individual radiomics and deep learning approaches, this combined model provides a noninvasive and interpretable solution for preoperative SLN involvement assessment. These findings support the continued exploration and clinical translation of multimodal imaging and machine learning strategies for personalized BC management.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-223/rc
Data Sharing Statement: Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-223/dss
Peer Review File: Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-223/prf
Funding: The study was supported by grants from
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://gs.amegroups.com/article/view/10.21037/gs-2025-223/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This retrospective study was approved by the ethics committee of Shanghai Pudong New Area People’s hospital (No. prylz2020-082). Shuguang Hospital Affiliated to Shanghai University of Chinese Traditional Medicine was informed of the study and agreed to the use of its data. Written consent from patients was waived due to the retrospective nature of the study. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- DeSantis CE, Ma J, Gaudet MM, et al. Breast cancer statistics, 2019. CA Cancer J Clin 2019;69:438-51. [Crossref] [PubMed]
- Ferrarazzo G, Nieri A, Firpo E, et al. The Role of Sentinel Lymph Node Biopsy in Breast Cancer Patients Who Become Clinically Node-Negative Following Neo-Adjuvant Chemotherapy: A Literature Review. Curr Oncol 2023;30:8703-19. [Crossref] [PubMed]
- Chang JY, Wang W, Shen JL, et al. Impact of sentinel lymph node biopsy through the axillary cribriform fascia approach on intraoperative indicators and postoperative complications. Updates Surg 2023;75:757-67. [Crossref] [PubMed]
- Gentilini OD, Botteri E, Sangalli C, et al. Sentinel Lymph Node Biopsy vs No Axillary Surgery in Patients With Small Breast Cancer and Negative Results on Ultrasonography of Axillary Lymph Nodes: The SOUND Randomized Clinical Trial. JAMA Oncol 2023;9:1557-64. [Crossref] [PubMed]
- Zhang W, Wang S, Wang Y, et al. Ultrasound-based radiomics nomogram for predicting axillary lymph node metastasis in early-stage breast cancer. Radiol Med 2024;129:211-21. [Crossref] [PubMed]
- Bai X, Wang Y, Song R, et al. Ultrasound and clinicopathological characteristics of breast cancer for predicting axillary lymph node metastasis. Clin Hemorheol Microcirc 2023;85:147-62. [Crossref] [PubMed]
- Chung HL, Tso HH, Middleton LP, et al. Axillary Nodal Metastases in Invasive Lobular Carcinoma Versus Invasive Ductal Carcinoma: Comparison of Node Detection and Morphology by Ultrasound. AJR Am J Roentgenol 2022;218:33-41. [Crossref] [PubMed]
- Niu J, Ma J, Guan X, et al. Correlation Between Doppler Ultrasound Blood Flow Parameters and Angiogenesis and Proliferation Activity in Breast Cancer. Med Sci Monit 2019;25:7035-41. [Crossref] [PubMed]
- Alvarez S, Añorbe E, Alcorta P, et al. Role of sonography in the diagnosis of axillary lymph node metastases in breast cancer: a systematic review. AJR Am J Roentgenol 2006;186:1342-8. [Crossref] [PubMed]
- Upadhyaya VS, Lim GH, Chan EYK, et al. Evaluating the preoperative breast cancer characteristics affecting the accuracy of axillary ultrasound staging. Breast J 2020;26:162-7. [Crossref] [PubMed]
- Yiming A, Wubulikasimu M, Yusuying N. Analysis on factors behind sentinel lymph node metastasis in breast cancer by color ultrasonography, molybdenum target, and pathological detection. World J Surg Oncol 2022;20:72. [Crossref] [PubMed]
- Jin H, Gao Y. Prediction of axillary lymph node metastasis in breast cancer using an ultrasonic feature- and clinical data-based model. Am J Cancer Res 2024;14:5987-98. [Crossref] [PubMed]
- Fischerova D, Garganese G, Reina H, et al. Terms, definitions and measurements to describe sonographic features of lymph nodes: consensus opinion from the Vulvar International Tumor Analysis (VITA) group. Ultrasound Obstet Gynecol 2021;57:861-79. [Crossref] [PubMed]
- Qian X, Zhang B, Liu S, et al. A combined ultrasonic B-mode and color Doppler system for the classification of breast masses using neural network. Eur Radiol 2020;30:3023-33. [Crossref] [PubMed]
- Lyman GH, Giuliano AE, Somerfield MR, et al. American Society of Clinical Oncology guideline recommendations for sentinel lymph node biopsy in early-stage breast cancer. J Clin Oncol 2005;23:7703-20. [Crossref] [PubMed]
- Goldhirsch A, Winer EP, Coates AS, et al. Personalizing the treatment of women with early breast cancer: highlights of the St Gallen International Expert consensus on the primary therapy of early breast cancer. Ann Oncol. 2013;24:2206-23. [Crossref] [PubMed]
- Mikolajczyk A, Grochowski M. Data augmentation for improving deep learning in image classification problem. 2018 International Interdisciplinary PhD Workshop. 2018;117-122.
- Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference and prediction. 2nd ed. New York: Springer; 2009.
- Romero M, Interian Y, Solberg T, et al. Targeted transfer learning to improve performance in small medical physics datasets. Med Phys 2020;47:6246-56. [Crossref] [PubMed]
- Al-Gaashani MSAM, Samee NA, Alnashwan R, et al. Using a Resnet50 with a Kernel Attention Mechanism for Rice Disease Diagnosis. Life (Basel) 2023;13:1277. [Crossref] [PubMed]
- Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. Miami, FL, USA: 2009 IEEE Conference on Computer Vision and Pattern Recognition; 2009:248-255. doi:
10.1109/CVPR.2009.5206848 . - Selvaraju RR, Cogswell M, Das A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Venice, Italy: 2017 IEEE International Conference on Computer Vision (ICCV) 2017:618-626. doi:
10.1109/ICCV.2017.74 . - Sun Q, Lin X, Zhao Y, et al. Deep Learning vs. Radiomics for Predicting Axillary Lymph Node Metastasis of Breast Cancer Using Ultrasound Images: Don't Forget the Peritumoral Region. Front Oncol 2020;10:53. [Crossref] [PubMed]
- Wang X, Xie T, Luo J, et al. Radiomics predicts the prognosis of patients with locally advanced breast cancer by reflecting the heterogeneity of tumor cells and the tumor microenvironment. Breast Cancer Res 2022;24:20. [Crossref] [PubMed]
- Tagliafico AS, Piana M, Schenone D, et al. Overview of radiomics in breast cancer diagnosis and prognostication. Breast 2020;49:74-80. [Crossref] [PubMed]
- Yu FH, Wang JX, Ye XH, et al. Ultrasound-based radiomics nomogram: A potential biomarker to predict axillary lymph node metastasis in early-stage invasive breast cancer. Eur J Radiol 2019;119:108658. [Crossref] [PubMed]
- Zhao M, Zheng Y, Chu J, et al. Ultrasound-based radiomics combined with immune status to predict sentinel lymph node metastasis in primary breast cancer. Sci Rep 2023;13:16918. [Crossref] [PubMed]
- Yao J, Zhou W, Xu S, et al. Machine Learning-Based Breast Tumor Ultrasound Radiomics for Pre-operative Prediction of Axillary Sentinel Lymph Node Metastasis Burden in Early-Stage Invasive Breast Cancer. Ultrasound Med Biol 2024;50:229-36. [Crossref] [PubMed]
- Truhn D, Schrading S, Haarburger C, et al. Radiomic versus Convolutional Neural Networks Analysis for Classification of Contrast-enhancing Lesions at Multiparametric Breast MRI. Radiology 2019;290:290-7. [Crossref] [PubMed]
- Cong Y, Wang S, Zou H, et al. Imaging Predictors for Nonsentinel Lymph Node Metastases in Breast Cancer Patients. Breast Care (Basel) 2020;15:372-9. [Crossref] [PubMed]
- Zhu Y, Lv W, Wu H, et al. A preoperative nomogram for predicting the risk of sentinel lymph node metastasis in patients with T1-2N0 breast cancer. Jpn J Radiol 2022;40:595-606. [Crossref] [PubMed]
- Zheng X, Yao Z, Huang Y, et al. Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat Commun 2020;11:1236. [Crossref] [PubMed]
- Giuliano AE, Ballman KV, McCall L, et al. Effect of Axillary Dissection vs No Axillary Dissection on 10-Year Overall Survival Among Women With Invasive Breast Cancer and Sentinel Node Metastasis: The ACOSOG Z0011 (Alliance) Randomized Clinical Trial. JAMA 2017;318:918-26. [Crossref] [PubMed]
- Reimer T, Stachs A, Veselinovic K, et al. Axillary Surgery in Breast Cancer - Primary Results of the INSEMA Trial. N Engl J Med 2025;392:1051-64. [Crossref] [PubMed]


