Preoperative prediction of axillary lymph node metastasis in breast invasive ductal carcinoma patients using a deep learning model based on dynamic contrast-enhanced magnetic resonance imaging images: a multicenter study
Original Article

Preoperative prediction of axillary lymph node metastasis in breast invasive ductal carcinoma patients using a deep learning model based on dynamic contrast-enhanced magnetic resonance imaging images: a multicenter study

Changcong Gu1 ORCID logo, Yuqing He2, Jinshi Lin3, Zilong Wang3, Shuai Guo2, Huang Yang4, Wenxi Wang1, Junyi Sun1, Huishu Gan5, Haoxiang Li3

1Department of Radiology, The First Hospital of Qinhuangdao, Qinhuangdao, China; 2Department of Ultrasound, The First Hospital of Qinhuangdao, Qinhuangdao, China; 3Department of Radiology, Guangdong Provincial People’s Hospital, Zhuhai Hospital (Jinwan Central Hospital of Zhuhai), Zhuhai, China; 4Department of Emergency, The First Hospital of Qinhuangdao, Qinhuangdao, China; 5Department of Ultrasound, Maternal and Child Health Hospital of Qinhuangdao, Qinhuangdao, China

Contributions: (I) Conception and design: H Li, C Gu; (II) Administrative support: H Li, Y He; (III) Provision of study materials or patients: H Gan, W Wang, J Sun; (IV) Collection and assembly of data: J Lin, Z Wang; (V) Data analysis and interpretation: C Gu, H Yang, W Wang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Haoxiang Li, MD. Department of Radiology, Guangdong Provincial People’s Hospital, Zhuhai Hospital (Jinwan Central Hospital of Zhuhai), Road, Zhuhai, China. Email: lihx89@mail.sysu.edu.cn.

Background: Invasive ductal carcinoma (IDC) is the most common histological subtype of breast cancer, and axillary lymph node metastasis (ALNM) is a pivotal factor in clinical staging, prognostic assessment, and treatment planning. This study aims to develop and validate a deep learning (DL) model based on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) for the prediction of ALNM in IDC patients.

Methods: This multicenter study conducted a retrospective analysis of DCE-MRI images from 520 patients diagnosed with IDC of the breast. The training and internal validation sets consisted of 411 patients from The First Hospital of Qinhuangdao, while the external testing set included 109 patients from the Maternal and Child Health Hospital of Qinhuangdao. Radiomics and DL features were extracted separately from the DCE-MRI images. We evaluated five models (Clinical, Radiomics, Radiomics-Clinical, DL, DL-Clinical) using radiomics features, DL features, and clinical features. Finally, the predictive performance of the models was evaluated using the receiver operating characteristic (ROC) curve and the area under the curve (AUC).

Results: The AUCs for the Clinical model and Radiomics model, which are machine learning models, and the DL-model, were 0.807, 0.840, and 0.865, respectively. The combined models incorporating clinical features, namely the Radiomics-Clinical and DL-Clinical models, achieved AUCs of 0.824 and 0.935, respectively. Among the five models, the DL-Clinical model demonstrated a significant advantage in predicting ALNM. Additionally, this model exhibited robust performance in both internal validation and external testing sets, with AUCs of 0.946 and 0.951, respectively.

Conclusions: The DCE-MRI-based DL-Clinical model provides a non-invasive adjunct tool for preoperative identification of ALNM in patients with breast IDC, thereby enhancing the efficacy of personalized treatment strategies and improving patient quality of life.

Keywords: Breast cancer; axillary lymph node metastasis (ALNM); radiomics; machine learning; deep learning (DL)


Submitted Aug 10, 2025. Accepted for publication Oct 12, 2025. Published online Nov 25, 2025.

doi: 10.21037/gs-2025-365


Highlight box

Key findings

• This study developed and validated a deep learning (DL) model based on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to predict axillary lymph node metastasis (ALNM) in patients with breast invasive ductal carcinoma (IDC).

• Five models were compared, combining clinical, radiomics, and DL features. The combined DL and clinical features (DL-Clinical) model showed superior predictive performance for ALNM.

What is known and what is new?

• Preoperative prediction of ALNM is crucial for treatment planning in breast cancer patients. Currently, accurate and non-invasive methods for this prediction are limited.

• This study introduces a novel DCE-MRI-based DL model that effectively integrates both DL and clinical features. It offers a powerful, non-invasive tool to predict ALNM with high accuracy.

What is the implication, and what should change now?

• The DCE-MRI-based DL-Clinical model provides a highly accurate and robust method for the preoperative prediction of ALNM. Its strong performance in both internal and external validation sets suggests its high clinical utility.

• This model can serve as a valuable adjunct tool for clinicians, helping to enhance the efficacy of personalized treatment strategies and improve the quality of life for patients with breast IDC.


Introduction

According to the 2022 global cancer statistics, breast cancer has the second-highest incidence after lung cancer and is one of the most common cancers among women (1). Breast cancer encompasses various types, with invasive ductal carcinoma (IDC) being the most common histological subtype, accounting for approximately 70% to 80% of all breast cancer cases (2,3). Axillary lymph node metastasis (ALNM) is the most common route of breast cancer spread, and the presence or absence of ALNM is crucial for clinical staging, prognostic assessment, and the development of personalized treatment plans for breast cancer patients (4,5). Currently, axillary lymph node (ALN) dissection and sentinel lymph node biopsy (SLNB) are the standard methods for determining the ALNM status (6). However, both methods are invasive and may lead to postoperative morbidity, such as upper limb edema, shoulder joint dysfunction, and loss of shoulder muscle strength (7,8). SLNB can have a false negative rate, since despite its false negative rate, this did not translate to clinical oncological compromise, and SLNB is still commonly practiced (9). Therefore, developing a non-invasive and accurate diagnostic method for ALNs would aid in preoperative evaluation of ALN status and reduce unnecessary lymph node surgeries. This is especially crucial after the publication of the SOUND (Sentinel Node vs. Observation After Axillary Ultra-Sound) trial which showed that omission of SLNB in patients with early breast cancer did not result in inferior oncological outcomes (10).

Magnetic resonance imaging (MRI) is a radiation-free, high soft-tissue contrast, and highly sensitive technique that plays a critical role in assessing ALNM status (11). Dynamic contrast-enhanced MRI (DCE-MRI) provides morphological and hemodynamic features that reflect tumor angiogenesis. DCE-MRI is essential for breast cancer staging, diagnosis, treatment, post-treatment monitoring, detection of tumor recurrence, as well as the assessment of high-risk patients (12,13).

Radiomics is an important method for predicting tumor biological characteristics, as it can extract high-throughput medical image features to quantify the tumor’s shape, intensity, and texture information (14). However, there are several limitations with radiomics. For example, the stability of radiomics features varies based on changing parameters such as pixel size, region of interest (ROI) delineation, and signal-to-noise ratio, and may not fully characterize tumor heterogeneity (15-17). Meanwhile, deep learning (DL) can automatically learn image information and has been widely applied in image recognition and classification. Compared to radiomics, DL features extracted from convolutional neural network (CNN) contain more abstract medical image information (18,19), thereby playing a crucial role in the diagnosis, treatment, and prognosis assessment of diseases.

To our knowledge, most existing studies focus on single-center data with small sample sizes and do not adequately address the heterogeneity of multicenter data, which may affect the generalizability of models. Moreover, existing research often concentrates on analyzing a single type of feature, lacking a systematic approach to integrating radiomics and clinical features to enhance predictive accuracy. Therefore, this study aims to address these insufficiently explored research questions, with specific objectives including: (I) evaluating the predictive performance of clinical, DL, and Radiomics models for ALNM in IDC; (II) developing a model that integrates DL features with clinical characteristics; (III) validating this model using internal validation set and external testing set. We present this article in accordance with the TRIPOD reporting checklist (available at https://gs.amegroups.com/article/view/10.21037/gs-2025-365/rc).


Methods

Patients

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the ethics committees of The First Hospital of Qinhuangdao (No. 202301A199) and the Maternal and Child Health Hospital of Qinhuangdao (No. 202301A157). Informed consent was waived in this retrospective study. Inclusion criteria were as follows: (I) pathological diagnosis of IDC with complete ALN status records; (II) preoperative and pre-biopsy breast MRI performed with clear imaging; (III) lesion presented as a mass; (IV) no neoadjuvant therapy administered before surgery. Exclusion criteria included: (I) occult or recurrent IDC; (II) multifocal lesions or insufficient image quality, and (III) incomplete clinical or histopathological information. Notably, multifocal lesions were excluded due to difficulties in distinguishing the responsible lesion causing metastasis.

Figure 1 displays the patient selection flowchart. A total of 520 patients with histologically confirmed primary IDC from two hospitals were included in this study. A review of 411 breast cancer patients from The First Hospital of Qinhuangdao between December 1, 2017, and March 30, 2024, was conducted, and patients were randomly allocated in a 7:3 ratio to the training set (n=287) and the internal validation set (n=124). Between December 30, 2020, and January 30, 2024, the external testing set was recruited from Maternal and Child Health Hospital of Qinhuangdao, comprising 109 patients.

Figure 1 Patient inclusion and exclusion process. ALN, axillary lymph node; DCE-MRI, dynamic contrast-enhanced magnetic resonance imaging; IDC, invasive ductal carcinoma; MRI, magnetic resonance imaging.

Clinical parameters

The preoperative clinicopathological features collected include age, tumor maximum diameter, tumor volume [defined as the three-dimensional size of the tumor, measured in cubic centimeters (cm3), by delineating the ROI of the tumor, performing segmentation, and calculating its volume using imaging software], menopausal status, and a series of biomarker statuses, such as estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor 2 (HER2) status, Ki-67 index, and molecular subtype. According to the standards of the 2021 St. Gallen Consensus Conference (20), ER and PR are considered positive if the positivity rate reaches or exceeds 1%. The status of HER2 is defined by immunohistochemical (IHC) staining score. Tumors with a score of 0 or 1+ are defined as HER2-negative, while those with a score of 3+ are defined as HER2-positive. For tumors with an IHC score of 2+, further fluorescence in situ hybridization (FISH) testing is required to confirm the HER2 status. A FISH result showing no amplification is considered HER2-negative, while amplification indicates HER2-positivity. Regarding Ki-67, a nuclear staining positivity rate exceeding 20% is considered high Ki-67 expression, while a rate below this threshold is considered low Ki-67 expression. Breast cancer can be categorized into molecular subtypes based on IHC markers: luminal A-like (ER+, PR ≥20%, HER2, Ki-67 <20%), luminal B-like (ER+, HER2 with Ki-67 ≥20% or PR <1%; or ER+, HER2+), HER2-enriched (ER, PR, HER2+), and triple-negative breast cancer (ER, PR, HER2).

MRI examination

MRI scanning for the training and internal validation sets was performed using two scanners (Siemens Avanto 1.5 T and GE Architect 3.0 T) at The First Hospital of Qinhuangdao. The external testing set was acquired using one scanner (1.5 T Siemens ESSENZA) at the Maternal and Child Health Hospital of Qinhuangdao. All patients were scanned in the prone position using dedicated bilateral breast coils. A gadolinium-based contrast agent (Gadolinium-DTPA; Magnevist, Schering, Germany, 0.1 mmol/kg) was intravenously administered as a bolus injection, followed by a 20 mL saline flush for DCE scans. The MRI characteristics were evaluated using axial DCE-MRI. Details on MRI acquisition parameters are provided in Table S1. We analyzed the first-contrast sequence because this time window corresponds to the peak distribution of the contrast agent within the tissues. During this period, the contrast between malignant lesions and normal tissues is most pronounced, allowing for clearer differentiation and better diagnostic value.

Image preprocessing and tumor segmentation

Prior to image segmentation, comprehensive preprocessing steps were performed on the preoperative MRI images to ensure data consistency and quality. These steps included N4 bias field correction to mitigate intensity inhomogeneities caused by field strength variations, B-spline interpolation to standardize the spatial resolution to a 1 mm × 1 mm × 1 mm isotropic voxel size, and histogram standardization to normalize intensity values for consistent comparisons across different datasets. Subsequently, the ROI encompassing the entire tumor volume of the primary breast cancer was delineated. This was achieved through a semi-automatic approach on the first phase post-contrast images of the DCE-MRI, utilizing the ITK-SNAP software (an open-source software, version 3.8, available at http://www.itksnap.org). Following the semi-automatic delineation, a manual adjustment of the segmentation contours was performed on these sequences as needed to refine the accuracy of the ROI. All delineations were conducted by a radiologist with five years of specialized experience in imaging diagnostics, who remained blinded to the clinical and pathological information of the patients to minimize potential bias. The overall study design, including these procedural steps, is schematically illustrated in Figure 2.

Figure 2 Schematic diagram of the research workflow. AUC, area under the curve; LASSO, least absolute shrinkage and selection operator; ROC, receiver operating characteristic.

Radiomics feature extraction

Radiomics features include geometric shapes, intensities, and textures. Feature selection is conducted within the training set. To ensure the stability of features extracted from the ROI, 50 patients were randomly selected, and the tumor ROIs were independently delineated by another radiologist with 8 years of experience in imaging diagnostics. When the boundaries of tumor tissue were indeterminate, a senior radiologist with 10 years of experience made the final decision to assess the reliability and consistency of the imaging features. The intraclass correlation coefficient (ICC) was calculated for each feature. Features with an ICC of 0.75 or higher were considered robust and retained. All extracted features were normalized using Z-score normalization and underwent t-tests to assess their statistical significance, retaining only those with a P<0.05. To reduce collinearity, Pearson’s correlation coefficient was used to examine the relationships between features, excluding any feature pairs with a coefficient above 0.9. Within a 10-fold cross-validation framework, least absolute shrinkage and selection operator (LASSO) regression was used to refine features further, determining the optimal regularization parameter λ, thus effectively selecting the most discriminative radiomic features.

DL feature extraction

This study employed a CNN to extract DL features from DCE-MRI data. The CNN model was designed to integrate convolutional layers for feature extraction, followed by dense layers for classification, iteratively optimizing weights to enhance predictive accuracy.

The CNN architecture was constructed using the Keras Sequential API. The model begins with a 1D convolutional layer (Conv1D) comprising 64 filters with a kernel size of 2, utilizing the ReLU activation function to extract initial feature vectors. The convolutional output was flattened to a 1D vector using a Flatten layer, followed by a dense layer with 64 units and ReLU activation to capture complex patterns. The output layer consisted of a single unit with a sigmoid activation function for binary classification, producing a probability score for each input sample.

The model was compiled using the Adam optimizer with default parameters (learning rate =0.001) and binary cross-entropy as the loss function, with accuracy as the evaluation metric. Imaging data were standardized using Z-score normalization, adjusted to fit the model’s input dimensions, and then fed into the model. The model was trained on the training dataset for 10 epochs with a batch size of 32. Data augmentation and early stopping mechanisms were implemented to enhance model generalizability and prevent overfitting. Imaging data were standardized using Z-score normalization, adjusted to fit the model’s input dimensions, and then fed into the model.

The model analyzed data layer by layer through forward propagation, extracting feature vectors from the penultimate layer (before the fully connected layer). These feature vectors were input into the fully connected layer to produce predictive results. The output layer processed the feature vectors using the sigmoid activation function, converting them into a probability score, from which the model selected the category based on a threshold of 0.5 as the final prediction.

To select the most discriminative DL-derived features and optimize model performance, this study employed a Gradient Boosting Machine (GBM) model for feature selection. GBM, an ensemble learning algorithm, was specifically chosen for its inherent capabilities in handling complex, high-dimensional datasets, its robust feature importance evaluation, and its interpretability (21-23). This method can automatically calculate each feature’s importance score, providing a data-driven approach to dimensionality reduction. This characteristic is particularly advantageous when dealing with the high-dimensional nature of DL features, as it effectively identifies those with the strongest predictive power. Furthermore, the interpretability offered by GBM’s feature importance scores helps elucidate which DL features contribute most significantly to the predictive outcome.

Following GBM’s evaluation, features were sorted by their calculated contribution, and those with low importance were discarded, retaining the top k most important features. These carefully selected features were then used for subsequent model training and evaluation. This rigorous feature selection process not only significantly reduced feature dimensionality and model complexity but also substantially enhanced the model’s performance and stability in predicting lymph node metastasis in IDC of the breast.

Model development

To assess the predictive ability for ALNM in breast IDC, this study initially constructed and compared three machine learning models. The Clinical model, based on univariate and multivariate logistic regression, identified key clinical features; the Radiomics model employed LASSO regression analyses for feature selection from radiomic features; the Radiomics-Clinical model integrated these selected clinical and radiomic features. This study utilized 5-fold cross-validation to select the optimal machine learning algorithm for each model, with details on algorithm selection and performance comparisons available in Table S2 and Figure S1. Subsequently, we developed a DL model and conducted comparisons with the machine learning models. Next, we combined selected clinical and DL features to construct the DL-Clinical model to assess whether it could further enhance predictive performance. Finally, the models were validated on an internal validation set and an external testing set.

Performance evaluation and model comparison

Receiver operating characteristic (ROC) analysis and scalar metrics at the optimal ROC point [area under the curve (AUC), sensitivity, specificity, accuracy] were used to evaluate the predictive performance for ALNM in breast IDC. Across all cohorts, the models compared included Clinical, Radiomics, DL, Radiomics-Clinical, and DL-Clinical. The entire study design and pipeline are shown in Figure 2.

Statistical analysis

Statistical analyses were conducted using the Storm Statistics Platform or Zstats software (www.zstats.net) and R version 4.4.0 (2024-04-24). Unless otherwise specified, continuous variables are expressed as mean ± standard deviation (SD), and categorical variables are presented as frequencies or percentages. The Kolmogorov-Smirnov test was employed to assess the normality of data distribution. The Mann-Whitney U test was used to compare continuous variables, while the Chi-squared test or Fisher’s exact test was used to assess binary qualitative variables. Data processing, feature extraction, selection, and machine learning model prediction were performed using standard Python (version 3.9) libraries (scipy, numpy, pandas, scikit-learn, matplotlib, tensorflow).


Results

Clinical features

Table 1 presents the characteristics of all patients. In this study, the training set, internal validation set, and external testing set included 287, 124, and 109 IDC patients, respectively. In these cohorts, the rates of ALNM were 38.68% (111/287), 39.52% (49/124), and 37.61% (41/109), respectively. There were no significant differences in terms of age, tumor maximum diameter, tumor volume, molecular subtype, ER status, PR status, and HER2 status, Ki-67 index and molecular subtypes among the cohorts.

Table 1

Baseline clinical-radiological features of the datasets

Variables Total (n=520) Training set (n=287) Internal validation set (n=124) External testing set (n=109) Statistic P
Age, years 54.00 (47.00, 61.00) 54.00 (46.50, 62.00) 55.00 (49.00, 61.00) 53.00 (45.00, 59.00) 1.77 0.41
Max diameter, cm 2.20 (1.70, 2.90) 2.10 (1.70, 2.85) 2.20 (1.60, 3.20) 2.30 (1.70, 2.90) 1.10 0.58
Volume, cm3 3.42 (1.75, 6.75) 3.42 (1.83, 6.72) 3.67 (1.83, 7.59) 3.25 (1.45, 6.03) 1.75 0.42
ALN status 0.09 0.96
   Negative 319 (61.35) 176 (61.32) 75 (60.48) 68 (62.39)
   Positive 201 (38.65) 111 (38.68) 49 (39.52) 41 (37.61)
Molecular subtype 11.85 0.07
   Luminal A 56 (10.77) 27 (9.41) 12 (9.68) 17 (15.60)
   Luminal B 366 (70.38) 201 (70.03) 95 (76.61) 70 (64.22)
   HER2-enriched 32 (6.15) 21 (7.32) 8 (6.45) 3 (2.75)
   TNBC 66 (12.69) 38 (13.24) 9 (7.26) 19 (17.43)
ER status 3.83 0.15
   Negative 113 (21.73) 62 (21.60) 21 (16.94) 30 (27.52)
   Positive 407 (78.27) 225 (78.40) 103 (83.06) 79 (72.48)
PR status 4.12 0.13
   Negative 171 (32.88) 101 (35.19) 43 (34.68) 27 (24.77)
   Positive 349 (67.12) 186 (64.81) 81 (65.32) 82 (75.23)
HER2 status 0.59 0.75
   Negative 409 (78.65) 226 (78.75) 95 (76.61) 88 (80.73)
   Positive 111 (21.35) 61 (21.25) 29 (23.39) 21 (19.27)
Ki-67 index 3.83 0.15
   <20 83 (15.96) 42 (14.63) 17 (13.71) 24 (22.02)
   ≥20 437 (84.04) 245 (85.37) 107 (86.29) 85 (77.98)
Menopausal status 0.01 >0.99
   Premenopausal 211 (40.58) 117 (40.77) 50 (40.32) 44 (40.37)
   Postmenopausal 309 (59.42) 170 (59.23) 74 (59.68) 65 (59.63)

Data are presented as number (%) or M (Q1, Q3). , Kruskal-Waills test. , Chi-squared test. ALN, axillary lymph node; ER, estrogen receptor; HER2, human epidermal growth factor receptor 2; M, median; PR, progesterone receptor; Q1, 1st quartile; Q3, 3rd quartile; TNBC, triple-negative breast cancer.

In the univariate logistic analysis, factors associated with ALNM included maximum diameter, volume, molecular subtype, PR status, HER2 status and Ki-67 index. Multivariate logistic analysis indicated that maximum diameter, HER2 status and Ki-67 index are independent risk factors for ALNM in breast IDC. We developed a Clinical model based on these predictive variables (Table 2).

Table 2

Univariate and multivariable logistic regression

Variables Univariate analysis Multivariable analysis
β S.E Z P OR (95% CI) β S.E Z P OR (95% CI)
Age 0.01 0.01 0.52 0.61 1.01 (0.98–1.03)
Max diameter 1.00 0.16 6.19 <0.001 2.71 (1.97–3.71) 0.93 0.16 5.71 <0.001 2.53 (1.84–3.48)
Volume 0.14 0.03 4.83 <0.001 1.15 (1.09–1.22)
Menopausal status 0.12
   Premenopausal 1.00 (reference)
   Postmenopausal 0.38 0.25 1.54 1.47 (0.90–2.40)
Molecular subtype
   Luminal A 1.00 (reference)
   Luminal B 2.15 0.75 2.88 0.004 8.61 (1.99–37.37)
   HER2-enriched 2.43 0.85 2.84 0.004 11.36 (2.13–60.71)
   TNBC 2.31 0.80 2.88 0.004 10.12 (2.09–48.92)
ER status 0.08
   Negative 1.00 (reference)
   Positive −0.51 0.29 −1.76 0.60 (0.34–1.06)
PR status <0.001
   Negative 1.00 (reference)
   Positive −0.89 0.25 −3.50 0.41 (0.25–0.68)
HER2 status <0.001 0.05
   Negative 1.00 (reference) 1.00 (reference)
   Positive 0.98 0.29 3.32 2.66 (1.49–4.73) 0.65 0.33 1.98 1.92 (1.01–3.64)
Ki-67 index <0.001 0.01
   <20 1.00 (reference) 1.00 (reference)
   ≥20 1.73 0.49 3.51 5.64 (2.14–14.85) 1.44 0.56 2.57 4.24 (1.41–12.76)

CI, confidence interval; ER, estrogen receptor; HER2, human epidermal growth factor receptor 2; OR, odds ratio; PR, progesterone receptor; S.E, standard error; TNBC, triple-negative breast cancer.

Features selection

We utilized the LASSO regression analyses to select 11 key features from radiomic data, and based on these features, a Radiomics model was constructed using machine learning algorithms. Figure 3A shows the coefficient path diagram of the LASSO regression model, while Figure 3B details the relative importance of each feature in the model. Subsequently, this study employed the GBM algorithm to select DL features. Figure 3C illustrates the relationship between the number of selected features and model performance (accuracy and AUC values). The results indicate that as the number of features increased from 1 to 3, both the AUC value and accuracy progressively improved, reaching the highest when the number of features was 3. Based on this, we selected these three most important features for subsequent model training, validation and testing, with the relevant features and their importance displayed in Figure 3D.

Figure 3 Radiomics feature and DL feature selection. (A) LASSO regression coefficient path plot. (B) The histogram of the Rad score based on the selected features. (C) The performance of the CNN model in relation to the number of selected features after feature extraction from DCE-MRI images of breast IDC patients. (D) DL feature and corresponding importance scores filtered using GBM. AUC, area under the curve; CNN, convolutional neural network; DCE-MRI, dynamic contrast-enhanced magnetic resonance imaging; DL, deep learning; GBM, Gradient Boosting Machine; IDC, invasive ductal carcinoma; LASSO, least absolute shrinkage and selection operator.

Model performance evaluation

In this study, we utilized a DL model to assess its performance in predicting ALNM in patients with breast IDC, comparing it against models based on traditional radiomics and clinical features. In the training set, the DL model demonstrated superior discriminative ability, achieving an AUC of 0.865, outperforming the Radiomics model (AUC =0.840) and the Clinical model (AUC =0.807). In the internal validation set, the DL model continued to exhibit its superiority, with an AUC of 0.894, higher than that of the Radiomics model (AUC =0.741) and Clinical model (AUC =0.725). The DL-Clinical model, which integrates DL features and clinical features, performed even better in both the training and internal validation sets, with AUC values of 0.935 and 0.946, respectively. These results were markedly superior to those of the Radiomics-Clinical model built using radiomics features and clinical features (AUC =0.824 and 0.757), thus highlighting the potential of DL technology in enhancing predictive accuracy. Furthermore, we used an external testing set to further assess the broad applicability of these five models. The AUC values for 5 models (Clinical, Radiomics, Radiomics-Clinical, DL, DL-Clinical) were 0.702, 0.711, 0.724, 0.883, and 0.951, respectively. The results indicated that the DL and DL-Clinical models maintained stable performance in the external testing set, demonstrating excellent generalization ability and stability (Table 3 and Figure 4).

Table 3

The performance comparison of different models

Model AUC (95% CI) Accuracy Sensitivity Specificity
Clinical model
   Training set 0.807 (0.749–0.858) 0.739 0.459 0.915
   Internal validation set 0.725 (0.629–0.814) 0.685 0.408 0.867
   External testing set 0.702 (0.599–0.804) 0.706 0.366 0.912
Radiomics model
   Training set 0.840 (0.791–0.888) 0.784 0.613 0.892
   Internal validation set 0.741 (0.647–0.834) 0.718 0.571 0.813
   External testing set 0.711 (0.594–0.819) 0.716 0.415 0.897
Radiomics-Clinical model
   Training set 0.824 (0.770–0.871) 0.77 0.622 0.864
   Internal validation set 0.757 (0.666–0.843) 0.694 0.551 0.787
   External testing set 0.724 (0.622–0.828) 0.734 0.341 0.971
DL model
   Training set 0.865 (0.819–0.909) 0.819 0.739 0.869
   Internal validation set 0.894 (0.833–0.948) 0.847 0.816 0.867
   External testing set 0.883 (0.801–0.947) 0.56 0.902 0.353
DL-Clinical model
   Training set 0.935 (0.900–0.965) 0.895 0.829 0.938
   Internal validation set 0.946 (0.900–0.979) 0.895 0.878 0.907
   External testing set 0.951 (0.910–0.982) 0.752 0.951 0.632

AUC, area under the curve; CI, confidence interval; DL, deep learning; DL-Clinical, deep learning and clinical features.

Figure 4 Performance evaluation of all models in ALNM prediction. The ROC curves of all models in the training set (A), internal validation set (B) and external testing set (C). ALNM, axillary lymph node metastasis; AUC, area under the curve; ROC receiver operating characteristic.

Discussion

This study developed and evaluated a DL model based on DCE-MRI for the preoperative prediction of ALNM in breast IDC. Using a multicenter set, we demonstrated that the DL model outperformed traditional machine learning models in predictive accuracy. By integrating clinical features with DL features, the DL-Clinical model further improved predictive performance. Additionally, to optimize DL feature selection, we employed the GBM algorithm, which enhanced the model’s interpretability and generalizability.

This study identified tumor maximum diameter, HER2 status, and Ki-67 index as independent risk factors for ALNM in patients with breast IDC through univariate and multivariate logistic regression analyses. High HER2 expression and high Ki-67 index have been confirmed as strong indicators of poor prognosis and high metastatic potential in breast cancer (24-26). These findings are consistent with previous studies, emphasizing the crucial role of these factors in predicting the likelihood of metastasis. Additionally, the maximum diameter of the tumor has long been considered a key determinant of the aggressiveness and metastatic risk of IDC tumors (27). The Clinical model developed in this study, based on three independent variables, provides a useful tool for predicting the ALNM risk in IDC patients. However, the predictive capacity is limited if it relies solely on clinical features. Although traditional machine learning models have shown promising prospects (28), relying solely on clinical data may not provide sufficient accuracy due to their inability to capture the full complexity of tumor biology.

Radiomics has been widely applied in the field of medical image analysis, capable of extracting high-throughput quantitative features from medical images such as X-ray, MRI and computed tomography (CT), and using machine learning algorithms to build predictive models (29-31). In breast cancer research, radiomics has been used to predict tumor malignancy, molecular subtypes, and disease-free survival (28,32,33). Existing studies have shown that Radiomics models based on DCE-MRI are highly accurate in predicting ALNM in breast cancer. For instance, Song et al. and Li et al. developed prediction models with significant AUC values of 0.847 and 0.82 (34,35), aligning closely with the findings of this study. However, radiomics methods rely on manual or semi-automatic ROI segmentation, which can lead to human error and inter-observer variability. Moreover, the interpretability of radiomics features is low, and they are greatly influenced by scanning parameters, imaging equipment, and data preprocessing methods, which may affect the model’s generalizability (36).

In various clinical applications, DL technologies have been incorporated into multiple radiomics studies, demonstrating significant advantages. For instance, DL models based on CT images can more accurately predict the response of advanced gastric cancer patients to neoadjuvant chemotherapy, outperforming Radiomics and Clinical models (37). In predicting preoperative sentinel lymph node metastasis (SLNM) in breast cancer patients, models based on CNNs surpass both Clinical model and Radiomics model (38,39). The CNN model employed in this study demonstrates superior performance in predicting ALNM for breast IDC, achieving higher AUC values compared to conventional models based on clinical features and radiomics. This outcome substantiates the advanced capabilities of DL models in medical image analysis, particularly in processing high-dimensional imaging data. The CNN is capable of effectively capturing complex spatial patterns and multidimensional feature interactions that may be overlooked by traditional algorithms. Additionally, the CNN’s automatic image segmentation not only reduces human error but also continuously optimizes the model through a learning mechanism, thereby enhancing the accuracy and reliability of predictions (40-42).

Existing research demonstrates that predictive models integrating imaging and clinical features outperform unimodal models, owing to the holistic approach that effectively leverages the complementary strengths of both feature types. However, most studies have focused on the application of ultrasound instead of MRI; for instance, Zheng et al. utilized DL combined with clinical features to establish a model predicting ALNM based on breast cancer ultrasound images, achieving significant predictive accuracy (AUC =0.902) (43). Similarly, Guo et al. employed DL and clinical features to create a combinatorial model for predicting SLN metastasis from ultrasound images of breast cancer, also achieving notable predictive accuracy (AUC =0.869) (44). This demonstrates the effectiveness of combining DL with clinical features to enhance predictive accuracy. However, axillary ultrasound can have limitations in identifying ALN metastasis, even though ultrasound is cheaper than MRI (45).

Although our study was conducted in the field of MRI, the results are consistent with previous studies involving ultrasound, highlighting the importance of integrating different data sources to enhance the performance of predictive models. Compared to previous studies, this research utilized the GBM algorithm to optimize DL features. It excels in handling high-dimensional datasets and provides reliable feature importance scores, enabling effective dimensionality reduction. By identifying key features related to ALMN, GBM enhances model interpretability and supports the development of a robust predictive model (21-23).

While our model demonstrated robust performance, we observed differences in its performance, particularly specificity, between the internal validation and independent external test cohorts. Such variations are commonly encountered in medical artificial intelligence (AI) models and often stem from inherent heterogeneities across patient populations, imaging protocols, or equipment in different clinical settings. Importantly, these differences do not indicate overfitting; our integrated DL + Clinical model achieved a robust AUC of 0.951 on the external test set (Figure 4C), which notably surpassed its performance on both the internal validation (AUC =0.946) and training sets (AUC =0.935). This sustained or improved performance on unseen data, coupled with its consistent superiority over unimodal models, strongly affirms the model’s excellent generalization capabilities. We acknowledge that optimal clinical deployment also requires robust calibration, and future efforts will focus on enhancing model calibration to further ensure reliable probability estimates and maximize its clinical utility across diverse settings.

Indeed, there are certain limitations in this study, and we acknowledge that there are areas where further research is needed to fully address the complexities of the topic. Firstly, it focuses solely on IDC of the breast and does not cover other types of breast cancer. To enhance the clinical applicability of the research, future studies should consider including other histological types of breast cancer in the analysis to broaden the model’s universality and scope of application. Second, although this study is based on multicenter cohort data, its retrospective design may introduce selection bias, leading to insufficient sample representativeness. To further validate the DL-Clinical model developed, prospective studies are needed to expand the patient cohort, ensuring the reliability and generalizability of the results. Third, the study only included patients with unifocal IDC of the breast, as lymph node metastasis in multifocal and bilateral lesions is difficult to precisely locate, and ROI segmentation in non-tumor lesions is more complex. Therefore, the existing DL models are only applicable for predicting ALN involvement in patients with single lesions. Future developments should focus on creating models for patients with multifocal and bilateral breast lesions to better assess their ALN status. Lastly, this study solely utilized DCE-MRI for analysis. Future research plans include combining multimodal MRI and pathological images to conduct multi-omics studies, aiming to enhance the comprehensive predictive power and clinical value of the model.


Conclusions

The DL-Clinical prediction model developed in this study, based on clinical features and DL features from DCE-MRI images, provides a non-invasive and practical approach for preoperative prediction of ALN involvement in patients with IDC of the breast. Thus, this model facilitates individualized predictions, assisting physicians in preoperative assessment and making better clinical decisions. Future research will aim to build larger databases of breast cancer patients, further refine the prediction model, and conduct multicenter collaborative studies.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-365/rc

Data Sharing Statement: Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-365/dss

Peer Review File: Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-365/prf

Funding: This study was supported by the Qinhuangdao Science and Technology Research and Development Guidance Program (No. 202301A199).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://gs.amegroups.com/article/view/10.21037/gs-2025-365/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the ethics committees of The First Hospital of Qinhuangdao (No. 202301A199) and the Maternal and Child Health Hospital of Qinhuangdao (No. 202301A157). Informed consent was waived in this retrospective study.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
  2. Neagu AN, Whitham D, Seymour L, et al. Proteomics-Based Identification of Dysregulated Proteins and Biomarker Discovery in Invasive Ductal Carcinoma, the Most Common Breast Cancer Subtype. Proteomes 2023;11:13. [Crossref] [PubMed]
  3. Kaushikan R. Early Recurrence Detection of Invasive Ductal Carcinoma Utilizing High-Dimensional Genomic Data. 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC); 2024:1990-5.
  4. Chang JM, Leung JWT, Moy L, et al. Axillary Nodal Evaluation in Breast Cancer: State of the Art. Radiology 2020;295:500-15. [Crossref] [PubMed]
  5. Zhang X, Liu Y, Luo H, et al. PET/CT and MRI for Identifying Axillary Lymph Node Metastases in Breast Cancer Patients: Systematic Review and Meta-Analysis. J Magn Reson Imaging 2020;52:1840-51. [Crossref] [PubMed]
  6. Voinea SC, Sandru A, Bordea CI, et al. A Better Understanding of Axillary Lymph Node Dissection in the Era of Sentinel Lymph Node Biopsy. Chirurgia (Bucur) 2021;116:162-9. [Crossref] [PubMed]
  7. Pilger TL, Francisco DF, Candido Dos Reis FJ. Effect of sentinel lymph node biopsy on upper limb function in women with early breast cancer: A systematic review of clinical trials. Eur J Surg Oncol 2021;47:1497-506. [Crossref] [PubMed]
  8. Che Bakri NA, Kwasnicki RM, Khan N, et al. Impact of Axillary Lymph Node Dissection and Sentinel Lymph Node Biopsy on Upper Limb Morbidity in Breast Cancer Patients: A Systematic Review and Meta-Analysis. Ann Surg 2022;277:572-80. [Crossref] [PubMed]
  9. Park KU, Somerfield MR, Anne N, et al. Sentinel Lymph Node Biopsy in Early-Stage Breast Cancer: ASCO Guideline Update. J Clin Oncol 2025;43:1720-41. [Crossref] [PubMed]
  10. Gentilini OD, Botteri E, Sangalli C, et al. Sentinel Lymph Node Biopsy vs No Axillary Surgery in Patients With Small Breast Cancer and Negative Results on Ultrasonography of Axillary Lymph Nodes: The SOUND Randomized Clinical Trial. JAMA Oncol 2023;9:1557-64. [Crossref] [PubMed]
  11. Aktaş A, Gürleyik MG, Aydın Aksu S, et al. Diagnostic Value of Axillary Ultrasound, MRI, and (18)F-FDG-PET/ CT in Determining Axillary Lymph Node Status in Breast Cancer Patients. Eur J Breast Health 2022;18:37-47. [Crossref] [PubMed]
  12. Kataoka M, Iima M, Miyake KK, et al. Multiparametric Approach to Breast Cancer With Emphasis on Magnetic Resonance Imaging in the Era of Personalized Breast Cancer Treatment. Invest Radiol 2024;59:26-37. [Crossref] [PubMed]
  13. Yamaguchi K, Ichinohe K, Iyadomi M, et al. Abbreviated and Ultrafast Dynamic Contrast-enhanced (DCE) MR Imaging. Magn Reson Med Sci 2025;24:315-31. [Crossref] [PubMed]
  14. Ma C, Zhao Y, Song Q, et al. Multi-parametric MRI-based radiomics for preoperative prediction of multiple biological characteristics in endometrial cancer. Front Oncol 2023;13:1280022. [Crossref] [PubMed]
  15. Fontaine P, Acosta O, Castelli J, et al. The importance of feature aggregation in radiomics: a head and neck cancer study. Sci Rep 2020;10:19679. [Crossref] [PubMed]
  16. Bera K, Braman N, Gupta A, et al. Predicting cancer outcomes with radiomics and artificial intelligence in radiology. Nat Rev Clin Oncol 2022;19:132-46. [Crossref] [PubMed]
  17. Demircioğlu A. Reproducibility and interpretability in radiomics: a critical assessment. Diagn Interv Radiol 2025;31:321-8. [Crossref] [PubMed]
  18. Ziegelmayer S, Reischl S, Harder F, et al. Feature Robustness and Diagnostic Capabilities of Convolutional Neural Networks Against Radiomics Features in Computed Tomography Imaging. Invest Radiol 2022;57:171-7. [Crossref] [PubMed]
  19. He Z, McMillan AB. Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography. J Imaging Inform Med 2025; Epub ahead of print. [Crossref]
  20. Burstein HJ, Curigliano G, Thürlimann B, et al. Customizing local and systemic therapies for women with early breast cancer: the St. Gallen International Consensus Guidelines for treatment of early breast cancer 2021. Ann Oncol 2021;32:1216-35. [Crossref] [PubMed]
  21. Adler AI, Painsky A. Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection. Entropy (Basel) 2022;24:687. [Crossref] [PubMed]
  22. Zheng R, Jia Y, Ullagaddi C, et al. Optimizing feature selection with gradient boosting machines in PLS regression for predicting moisture and protein in multi-country corn kernels via NIR spectroscopy. Food Chem 2024;456:140062. [Crossref] [PubMed]
  23. Rikta ST, Uddin KMM, Biswas N, et al. XML-GBM lung: An explainable machine learning-based application for the diagnosis of lung cancer. J Pathol Inform 2023;14:100307. [Crossref] [PubMed]
  24. Davey MG, Hynes SO, Kerin MJ, et al. Ki-67 as a Prognostic Biomarker in Invasive Breast Cancer. Cancers (Basel) 2021;13:4455. [Crossref] [PubMed]
  25. Ma Q, Liu YB, She T, et al. The Role of Ki-67 in HR+/HER2- Breast Cancer: A Real-World Study of 956 Patients. Breast Cancer (Dove Med Press) 2024;16:117-26. [Crossref] [PubMed]
  26. Cheng X. A Comprehensive Review of HER2 in Cancer Biology and Therapeutics. Genes (Basel) 2024;15:903. [Crossref] [PubMed]
  27. Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA Cancer J Clin 2024;74:12-49. [Crossref] [PubMed]
  28. Yu Y, Tan Y, Xie C, et al. Development and Validation of a Preoperative Magnetic Resonance Imaging Radiomics-Based Signature to Predict Axillary Lymph Node Metastasis and Disease-Free Survival in Patients With Early-Stage Breast Cancer. JAMA Netw Open 2020;3:e2028086. [Crossref] [PubMed]
  29. Tabassum M, Suman AA, Suero Molina E, et al. Radiomics and Machine Learning in Brain Tumors and Their Habitat: A Systematic Review. Cancers (Basel) 2023;15:3845. [Crossref] [PubMed]
  30. Godbin AB, Jasmine SG. Leveraging Radiomics and Genetic Algorithms to Improve Lung Infection Diagnosis in X-Ray Images Using Machine Learning. IEEE Access 2024;12:47656-71.
  31. Yu J, Velichko Y, Kim H, et al. Machine learning models based on radiomics features to predict treatment response, biomarker status, and bone metastasis in patients with non-small cell lung cancer (NSCLC) treated with immune checkpoint inhibitors (ICIs). J Clin Oncol 2023;41:e21217.
  32. Gong X, Li Q, Gu L, et al. Conventional ultrasound and contrast-enhanced ultrasound radiomics in breast cancer and molecular subtype diagnosis. Front Oncol 2023;13:1158736. [Crossref] [PubMed]
  33. Conti A, Duggento A, Indovina I, et al. Radiomics in breast cancer classification and prediction. Semin Cancer Biol 2021;72:238-50. [Crossref] [PubMed]
  34. Li L, Yu T, Sun J, et al. Prediction of the number of metastatic axillary lymph nodes in breast cancer by radiomic signature based on dynamic contrast-enhanced MRI. Acta Radiol 2022;63:1014-22. [Crossref] [PubMed]
  35. Song D, Yang F, Zhang Y, et al. Dynamic contrast-enhanced MRI radiomics nomogram for predicting axillary lymph node metastasis in breast cancer. Cancer Imaging 2022;22:17. [Crossref] [PubMed]
  36. Mayerhoefer ME, Materka A, Langs G, et al. Introduction to Radiomics. J Nucl Med 2020;61:488-95. [Crossref] [PubMed]
  37. Cui Y, Zhang J, Li Z, et al. A CT-based deep learning radiomics nomogram for predicting the response to neoadjuvant chemotherapy in patients with locally advanced gastric cancer: A multicenter cohort study. EClinicalMedicine 2022;46:101348. [Crossref] [PubMed]
  38. Chen M, Kong C, Lin G, et al. Development and validation of convolutional neural network-based model to predict the risk of sentinel or non-sentinel lymph node metastasis in patients with breast cancer: a machine learning study. EClinicalMedicine 2023;63:102176. [Crossref] [PubMed]
  39. Cattell R, Ying J, Lei L, et al. Preoperative prediction of lymph node metastasis using deep learning-based features. Vis Comput Ind Biomed Art 2022;5:8. [Crossref] [PubMed]
  40. Sarvamangala DR, Kulkarni RV. Convolutional neural networks in medical image understanding: a survey. Evol Intell 2022;15:1-22. [Crossref] [PubMed]
  41. Salehi AW, Khan S, Gupta G, et al. A Study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope. Sustainability 2023;15:5930.
  42. Huynh BN, Groendahl AR, Tomic O, et al. Head and neck cancer treatment outcome prediction: a comparison between machine learning with conventional radiomics features and deep learning radiomics. Front Med (Lausanne) 2023;10:1217037. [Crossref] [PubMed]
  43. Zheng X, Yao Z, Huang Y, et al. Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat Commun 2020;11:1236. [Crossref] [PubMed]
  44. Guo X, Liu Z, Sun C, et al. Deep learning radiomics of ultrasonography: Identifying the risk of axillary non-sentinel lymph node involvement in primary breast cancer. EBioMedicine 2020;60:103018. [Crossref] [PubMed]
  45. Upadhyaya VS, Lim GH, Chan EYK, et al. Evaluating the preoperative breast cancer characteristics affecting the accuracy of axillary ultrasound staging. Breast J 2020;26:162-7. [Crossref] [PubMed]
Cite this article as: Gu C, He Y, Lin J, Wang Z, Guo S, Yang H, Wang W, Sun J, Gan H, Li H. Preoperative prediction of axillary lymph node metastasis in breast invasive ductal carcinoma patients using a deep learning model based on dynamic contrast-enhanced magnetic resonance imaging images: a multicenter study. Gland Surg 2025;14(11):2288-2301. doi: 10.21037/gs-2025-365

Download Citation