Development and validation of explainable machine learning models for the prediction of survival in patients with M1 breast cancer

Long Jin; Qifan Zhao; Shenbo Fu; Zheng Chao; Fei Cao; Jie Wu; Dede Ma; Xulong Zhu; Yuan Zhang

doi:10.21037/gs-2025-350

Original Article

Development and validation of explainable machine learning models for the prediction of survival in patients with M1 breast cancer

Long Jin^1,2, Qifan Zhao³, Shenbo Fu⁴, Zheng Chao⁵, Fei Cao⁶, Jie Wu¹, Dede Ma^2,7, Xulong Zhu^2,7, Yuan Zhang⁶

¹Department of Radiation Oncology, Shaanxi Provincial People’s Hospital, Xi’an, China; ²The Project of Shaanxi Breast Disease Clinical Medical Research Center, Xi’an, China; ³Department of Computer Science, The University of Hong Kong, Hong Kong, China; ⁴Department of Radiation Oncology, Shaanxi Provincial Cancer Hospital, Xi’an, China; ⁵Department of Medical Imaging Diagnosis, Hanzhong Central Hospital, Hanzhong, China; ⁶Department of Oncology, Shaanxi Provincial People’s Hospital, Xi’an, China; ⁷Department of surgical oncology, Shaanxi Provincial People’s Hospital, Xi’an, China

Contributions: (I) Conception and design: L Jin, Q Zhao; (II) Administrative support: None; (III) Provision of study materials or patients: L Jin; (IV) Collection and assembly of data: L Jin; (V) Data analysis and interpretation: Q Zhao; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Yuan Zhang, MD, PhD. Department of Oncology, Shaanxi Provincial People’s Hospital, No. 256 Youyi West Road, Beilin District, Xi’an 710068, China. Email: 2015141241065@stu.scu.edu.cn.

Background: The prognosis of patients with metastatic (M1) breast cancer is controversial, and the prognostic value of local therapy has not been well established. We aimed to develop and validate explainable machine learning (ML)-based survival models to predict overall survival (OS) in this population.

Methods: We retrospectively identified 10,214 female patients with histologically confirmed M1 breast cancer diagnosed between January 2013 and December 2018 from the Surveillance, Epidemiology, and End Result (SEER) database, each with a single malignant lesion. Patients with ambiguous or incomplete metastasis data were excluded. Candidate predictors included age; sex; laterality; American Joint Committee on Cancer (AJCC) Tumor, Node, and Metastasis stage; surgery of the primary site; breast subtype; estrogen receptor and progesterone receptor status; marital status; radiotherapy; chemotherapy; tumor grade; histology; and metastasis to the bone, brain, liver, and lung. Two time-to-OS prediction models—a neural network and a Cox proportional hazards model—were trained, internally validated, and externally tested in a cohort of 100 patients with M1 breast cancer from China. Model interpretability was assessed through global and individual feature importance analyses.

Results: In total, 10,314 patients were enrolled in the study. The median follow-up time was 42 months in the training dataset and 36 months in the test dataset. The deep learning network demonstrated greater stability and accuracy than did the Cox proportional hazards model in predicting patient survival, both on the internal test dataset (concordance index: 0.771 vs. 0.632) and in the external validation (concordance index: 0.782 and 0.650). Several important prognostic factors were identified by the deep learning model, including breast subtype, metastatic site, and surgery status. Surgery was associated with improved OS in patients with bone metastases selected after propensity score matching, with 5-year OS rates of 76.9% and 27.2% in the surgery and nonsurgery groups, respectively (P=0.001).

Conclusions: We developed and externally validated ML models that accurately predict survival in patients with M1 breast cancer. Breast subtype, metastatic site, and surgery status were the most important factors for survival prediction in this population. Patients with non-triple-negative breast cancer and metastasis to the bone may benefit from surgery, while those with metastasis to the brain, lung, or liver may not.

Keywords: M1 breast cancer; local therapy; machine learning (ML); triple-negative

Submitted Aug 05, 2025. Accepted for publication Dec 10, 2025. Published online Feb 11, 2026.

doi: 10.21037/gs-2025-350

Highlight box

Key findings

• We developed and externally validated explainable machine learning (ML) survival models for patients with M1 (stage IV) breast cancer using large-scale population data.

• The deep learning model outperformed the Cox proportional hazards model in both internal and external validation.

• Breast cancer subtype, metastatic site, and surgery of the primary tumor were identified as the most influential prognostic factors.

• Surgery was associated with improved overall survival in patients with bone metastases, but not in those with brain, lung, or liver metastases.

What is known and what is new?

• Survival outcomes in patients with M1 breast cancer are highly heterogeneous. Previous prognostic models, mainly based on traditional statistical methods, have demonstrated limited predictive accuracy and often lack interpretability. The survival benefit of local surgery in patients with stage IV breast cancer remains controversial, particularly across different metastatic patterns.

• This study developed an explainable ML-based survival prediction framework for patients with M1 breast cancer that integrates clinicopathological features, treatment variables, and metastatic patterns. With Shapley value-based explainability, the model provides transparent global- and individual-level interpretations of prognostic factors, enabling both accurate prediction and clinical insight into survival risk.

What is the implication, and what should change now?

• Explainable ML models may support individualized prognostic assessment and treatment decision-making for patients with M1 breast cancer. The findings suggest that local surgery should be selectively considered for patients with bone metastases, particularly the non-triple-negative subtypes, while surgery may not be useful for patients with visceral metastases. This approach promotes the more precise and transparent clinical management of patients with stage IV breast cancer.

Introduction

Breast cancer is the most prevalent malignant tumor globally. According to the latest data from the World Health Organization’s International Agency for Research on Cancer (IARC), there were an estimated 19.29 million new cancer cases worldwide in 2020, with the majority—2.26 million—being breast cancer. Breast cancer is also the most prevalent malignant tumor among Chinese women, and the 416,000 new cases arising in China each year account for 18.4% of the global number of breast cancer cases (1). About 6% of patients with newly diagnosed breast cancer already have distant metastasis, which constitutes stage IV breast cancer (2).

Metastatic (M1) breast cancer (MBC) involves substantial clinical heterogeneity and variable survival outcomes. Accurate prognostic assessment is critical for guiding therapeutic decisions and counseling patients. Over the past decade, several prognostic models have been developed for MBC. For instance, Barcenas et al. constructed a prognostic model for de novo and recurrent MBC using a large sample (N=10,655) that incorporated clinical and tumor variables, achieving a concordance index (C-index) of 0.731 and outperforming simpler models based only on hormone receptor (HR) and HER2 status (C-index 0.617) (3). A retrospective population-based study (N=2,151) focused on patients with HER2-positive MBC developed nomograms for overall survival (OS) and breast-cancer-specific survival (BCSS), reporting Cindices of 0.702 and 0.707, respectively, in the training cohort (4). Similarly, prognostic nomograms for de novo metastatic triple-negative breast cancer achieved Cindices of 0.69–0.70 (5), while a recent Surveillance, Epidemiology, and End Result (SEER)-based nomogram for patients with de novo MBC reported a C-index of 0.688 in the training set and 0.875 in the external validation set (6).

Despite these advances, existing models exhibit critical limitations. First, their discriminative performance remains moderate, leaving substantial variation in outcomes unexplained. Second, many models rely on limited or outdated variables, often including only demographic and basic clinicopathological factors such as age, molecular subtype, and metastatic site, but lack information on treatment details, histological grade, and emerging biomarkers (7,8), which can significantly affect survival outcomes (9,10).

Given these deficiencies, there is a clear need for improved prognostic models for MBC. To address this, we developed a deep learning-based model tailored for stage IV breast cancer, leveraging both treatment records and clinicopathological information. The model features an explainable component that not only provides treatment recommendations but also interprets the predicted log hazard ratio. By creating an interactive interface between clinicians and the model, we improved transparency and trust through explainable artificial intelligence. This approach helps physicians better comprehend the algorithm’s suggestions, enabling more informed decisions regarding personalized local treatment strategies. We present this article in accordance with the TRIPOD reporting checklist (available at https://gs.amegroups.com/article/view/10.21037/gs-2025-350/rc).

Methods

Eligibility criteria and patient information

A total of 10,214 medical cases were extracted from the SEER database and used as the training cohort. The data were derived from the National Cancer Institute’s Surveillance Research Program (Division of Cancer Control and Population Sciences) and were based on the November 2020 submission, covering total US incidence data from 1969 to 2019 at the county level (SEER Research Plus Data). Cases were included if they met the following criteria: (I) histological diagnosis of M1 breast cancer in a female patient diagnosed between January 2013 and December 2018 and (II) the presence of one malignant lesion. Meanwhile, cases with ambiguous or incomplete data on metastasis location were excluded. Demographic data (age and marital status at diagnosis), characteristics specific to breast cancer [TNM stage, histology type, tumor laterality, breast subtype, grade, estrogen receptor (ER), progesterone receptor (PR), and metastasis location], and information regarding treatment (primary site surgery, radiation, and chemotherapy) were all recorded. The key outcomes measured included survival time and mortality indicators.

We randomly selected data from 100 patients with M1 breast cancer who were diagnosed between January 2013 and December 2018 at Shaanxi Provincial People’s Hospital in China to form the external validation cohort. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments, and was approved by the Medical Ethics Committee of Shaanxi Provincial People’s Hospital (approval No. 2024-R703). These participants provided their written informed consent to participate in this study.

Explainable machine learning (ML) survival model design

This section presents the training and evaluation of two ML models developed to predict survival outcomes. Both algorithms aim to model the relationship between patient-specific variables and the log hazard function. The deep learning model architecture includes an input layer that receives baseline clinical data, followed by fully connected hidden layers interspersed with dropout layers. The network outputs the log hazard, and the rectified linear unit (ReLU) function is used as the activation function. Model parameters were updated with the Adam optimizer over multiple training epochs (11). For hyperparameter optimization, a random search was conducted across the following ranges: learning rate [0.00001, 0.01], dropout rate [0.5, 0.8], number of hidden layers [1, 10], and number of neurons per hidden layer [10, 80]. Similarly, a random search was applied to train the penalized Cox proportional hazards model, with the learning rate [0.001, 0.01] and penalization term [0.0001, 1] being tuned.

To enhance interpretability, we incorporated an explainable module using Shapley values to assess the contribution of each clinical feature to the neural network’s predictions. For individual-level explanations, Shapley values were calculated for each feature, and waterfall plots were generated to visualize their impact on specific predictions. To assess global feature importance, we averaged the absolute Shapley values across the full training dataset and ranked the features by descending importance. A schematic overview of the workflow is presented in Figure 1.

Figure 1 Flowchart of participant inclusion and exclusion. BC, breast cancer; SEER, Surveillance, Epidemiology, and End Result.

Statistical analysis

All covariates were converted into categorical variables for statistical analysis. The patients with missing data were removed. The Kaplan-Meier method was used for survival analysis, and survival curves were drawn. The log-rank test was used for survival rate comparisons. Propensity score matching (PSM) was performed based on clinical and pathological characteristics. Propensity scores were estimated with a logistic regression model that included baseline covariates such as age, sex, and clinical characteristics. Patients were matched via nearest-neighbor matching at a 1:1 ratio and a caliper of 0.3. The matched cohort was used for subsequent analyses. The model development was conducted in Python 3.9 (Python Software Foundation, Wilmington, DE, USA). Implementation of the deep learning model was performed via PyTorch version 1.11.0, while the penalized Cox proportional hazards model was built with PySurvival version 0.1.2. The code and dataset for reproducing the deep learning model are available online (https://github.com/snowflake-Zhao/brca-IV-surgery).

Results

Patients’ baseline characteristics

Based on the inclusion criteria, a total of 10,314 patients diagnosed with M1 breast cancer were enrolled. For assessment of model performance, the cohort was divided into two groups: a training set of 10,214 patients from the SEER database and a test set of 100 patients from Shaanxi Provincial People’s Hospital. Baseline clinical characteristics for both cohorts are summarized in Table 1. In the SEER cohort, infiltrating ductal carcinoma was the most common histological type, accounting for 76.69% of cases, followed by lobular carcinoma at 9.38%. Regarding molecular subtypes, luminal A breast cancer accounted for 58.29% of patients, while triple-negative cases accounted for 14.70%. With respect to metastatic sites, bone metastases were present in 65.14% of patients, brain in 93.29%, liver in 74.11%, and lung in 69.32%. Additionally, 37.34% of patients underwent surgery, and 34.49% received beam radiation as part of local therapy. In the test cohort, nearly three-quarters of patients were diagnosed with infiltrating ductal carcinoma, with most exhibiting moderately differentiated tumors. Among this group, 62% had bone metastasis, 87% had brain metastasis, 76% had liver metastasis, and 61% had lung metastasis. Approximately half of the patients were classified with the luminal A subtype, while 21% had triple-negative breast cancer. In terms of local treatment, over half underwent surgery, and nearly one-quarter received beam radiation.

Table 1

Baseline clinical characteristics of patients

Characteristic	Dataset, n (%)
Characteristic	Training	Testing
Age, years
≥85	455 (4.45)	1 (1.00)
80–84	551 (5.39)	4 (4.00)
75–79	689 (6.74)	2 (2.00)
70–74	867 (8.48)	8 (8.00)
65–69	1,183 (11.58)	9 (9.00)
60–64	1,383 (13.54)	8 (8.00)
55–59	1,430 (14.00)	11 (11.00)
50–54	1,274 (12.47)	15 (15.00)
45–49	928 (9.08)	8 (8.00)
40–44	610 (5.97)	15 (15.00)
35–39	435 (4.25)	14 (14.00)
30–34	273 (2.67)	4 (4.00)
25–29	113 (1.10)	1 (1.00)
20–24	22 (0.21)	0
15–19	1 (0.009)	0
T stage
T0	17 (0.16)	0
T1	0	5 (5.00)
T1a	48 (0.46)	1 (1.00)
T1b	187 (1.83)	1 (1.00)
T1c	847 (8.292)	0
T2	3,192 (31.25)	45 (45.00)
T3	1,764 (17.27)	14 (14.00)
T4	87 (0.85)	8 (8.00)
T4a	404 (3.95)	1 (1.00)
T4b	1,721 (16.84)	16 (16.00)
T4c	161 (1.57)	5 (5.00)
T4d	1,006 (9.84)	5 (5.00)
N stage
N0	1,962 (19.20)	7 (7.00)
N1	3,735 (36.56)	18 (18.00)
N1a	613 (6.00)	1 (1.00)
N2	228 (2.23)	25 (25.00)
N2a	981 (9.60)	1 (1.00)
N3	94 (0.92)	27 (27.00)
N3a	675 (6.60)	1 (1.00)
N3c	578 (5.65)	18 (18.00)
NX	514 (5.03)	2 (2.00)
Breast subtype
Luminal A	5,954 (58.29)	10 (10.00)
Luminal B	1,778 (17.40)	52 (52.00)
Triple-negative	1,502 (14.70)	21 (21.00)
HER2-positive	980 (9.59)	17 (17.00)
ER status
Positive	7,563 (74.04)	60 (60.00)
Negative	2,651 (25.95)	40 (40.00)
PR status
Positive	6,164 (60.34)	37 (37.00)
Negative	4,050 (39.65)	63 (63.00)
Mets to bone
Yes	6,654 (65.14)	62 (62.00)
No	3,560 (34.85)	38 (38.00)
Mets to brain
Yes	9,529 (93.29)	87 (87.00)
No	685 (6.70)	13 (13.00)
Mets to liver
Yes	7,570 (74.11)	76 (76.00)
No	2,644 (25.88)	24 (24.00)
Mets to lung
Yes	7,081 (69.32)	61 (61.00)
No	3,133 (30.67)	39 (39.00)
Grade
Well-differentiated	780 (7.63)	5 (5.00)
Moderately differentiated	4,223 (41.34)	81 (81.00)
Poorly differentiated	5,133 (50.25)	14 (14.00)
Undifferentiated	78 (0.76)	0
Radiation
Beam radiation	3,523 (34.49)	24 (24.00)
Combination of beam with implants or isotopes	5 (0.04)	0
None	6,271 (61.39)	76 (76.00)
Radiation, NOS method, or source not specified	50 (0.48)	0
Recommended, unknown if administered	256 (2.50)	0
Refused	99 (0.96)	0
Radioactive implants (including brachytherapy)	4 (0.03)	0
Chemotherapy
Yes	6,183 (60.53)	97 (97.00)
No/unknown	4,031 (39.46)	3 (3.00)

ER, estrogen receptor; HER2, human epidermal growth factor receptor 2; Mets, metastasis; N, node; NOS, not otherwise specified; PR, progesterone receptor; T, tumor.

Survival feature importance

The model and patient data were used to calculate the global feature significance (Figure 2). The three variables most strongly associated with OS, ranked in descending order of significance, were triple-negative (HR-negative and HER2-negative) status, presence of brain metastases at diagnosis, and HER2-positive (HR-negative and HER2-positive) status. Triple-negative status was the most important factor influencing a patient’s overall prognosis, and this relationship was also observed between each feature’s slope and OS. A patient with the triple-negative cancer may have a worse OS since this subtype lacks specific biomarkers that can guide treatment decisions and predict response therapy. Similarly, metastases to the brain can lead to neurological complications due to the infiltration and compression of brain tissue, which is ultimately detrimental to the OS of the patient. Conversely, if the patient’s subtype is HER2-positive, HER2-directed therapy may prolong OS. Other metastasis locations, including to the bone and liver, are also important features for OS and may also shorten OS. Finally, surgery to the primary site ranked fourth in the feature importance for survival.

Figure 2 The average influence of the top 10 neural network features on the output magnitude. The x-axis represents the mean Shapley value, while the y-axis displays the names of the neural network’s input features. ER, estrogen receptor; HER2, human epidermal growth factor receptor 2; HR, hormone receptor; SHAP, SHapley Additive exPlanations.

With respect to individual patient feature importance (Figure 3), one patient was selected at random (HER2-positive, female, metastasis to the liver, and no surgery) from our test dataset to demonstrate how the model can begin with an initial log-partial hazard of zero, reflecting no prior knowledge about the patient, and subsequently predict the patient’s outcome (0.493). In this example, the combined effect of the 144 insignificant features initially raises the log hazard risk by 0.14. Since her breast subtype is not triple negative, the outcome decreases by 0.07 for this feature. Furthermore, her ER status (0), grade (poorly differentiated), and surgical status (0) indicate that the extent of her tumor is at a higher degree of malignancy or aggressiveness, and thus hormonal therapies could be a viable treatment strategy; moreover, these above-mentioned features increase her risk by 0.009, 0.13, and 0.12, respectively. As the metastatic site is the liver, this increases the risk by 0.13, as this can be associated with liver function impairment and complications such as jaundice, ascites, and liver failure that can significantly lower a patient’s quality of life. At last, her HER2-positive subtype lowers her risk by 0.36, leading to a final result of 0.493.

Figure 3 The individual impact of the top 8 neural network features on output magnitude. The x-axis represents the Shapley value for each feature, and the y-axis indicates the names of the neural network’s input features. ER, estrogen receptor; HER2, human epidermal growth factor receptor 2.

Survival analysis of the subgroups

Surgery on the primary site ranked as the fourth most important prognostic factor (Figure 2), Meanwhile, metastasis may be associated with worse OS depending on the metastatic site (12,13). Therefore, OS was compared between the surgery and nonsurgery groups within each metastatic subgroup after PSM, in which the propensity score was estimated according to all available clinical covariates to minimize baseline imbalances. After PSM, no significant survival differences were observed in the brain, lung, and liver subgroups. Interestingly, in the bone subgroup, patients with bone metastasis who received surgery had markedly longer survival compared with their matched nonsurgery counterparts (5-year OS rate: 76.9% vs. 27.2%; P=0.01; Figure 4A); meanwhile, in the brain subgroup, the 5-year OS rates were respectively 0% and 0% (P=0.24; Figure 4B); in the lung subgroup, they were 61.1% and 28.3% (P=0.28; Figure 4C); and in the liver subgroup, they were 41.1% and 2.3% (P=0.07; Figure 4D).

Figure 4 Overall survival comparison between the surgery and nonsurgery groups in the metastatic subgroups after propensity score matching. (A) Bone metastasis. (B) Brain metastasis. (C) Lung metastasis. (D) Liver metastasis.

Training curve and model performance

The final deep learning model comprises five hidden layers with 10, 24, 47, and 22 neurons, respectively, and dropout layers inserted between them. The model was trained with a learning rate of 0.00601 and a dropout rate of 0.736. Throughout the training process, the loss curves for both the training and validation sets steadily declined. After 96 epochs, the validation loss plateaued at 6.0229, while the training loss continued to decrease from 5.1683. To avoid overfitting, training was halted at this point, and the model was preserved for testing. For comparison, the Cox proportional hazards model was implemented with a penalizer of 0.002 and a learning rate of 0.002. The deep learning model achieved a mean C-index of 0.771, significantly outperforming the Cox proportional hazards model (C-index =0.682) (Table 2). Overall, the deep learning model demonstrated superior predictive performance over the Cox model in terms of the C-index (0.782 vs. 0.690).

Table 2

Performance of the survival models in predicting the hazard ratio

Model	Cross-validation concordance index mean	External validation concordance index mean
Deep learning	0.771	0.782
Cox proportional hazards model	0.682	0.690

Calibration curves

To further evaluate the reliability of the 5-year OS predictions, we examined the calibration curves for both models (Figure 5). The neural network showed the closest agreement with the 45° reference line across most probability ranges, indicating well-calibrated and stable probability estimates. In contrast, the Cox proportional hazards model demonstrated larger deviations—particularly at the lower and midprobability regions—suggesting moderate underestimation of observed survival. Overall, the neural network achieved superior calibration performance at 5 years, providing more accurate alignment between predicted and observed survival probabilities.

Figure 5 Calibration curves of the predicted probabilities with observed outcomes. KM, Kaplan-Meier.

Discussion

This study sought to resolve the ongoing debate surrounding the potential benefits of local therapy for patients with stage IV breast cancer who have already received systemic treatment. Our findings suggest that patients with bone metastases may derive additional benefit from local therapy. In particular, local treatment is strongly recommended for patients with bone metastases who do not have triple-negative breast cancer. Beyond developing a deep learning model with high accuracy, we also integrated a module for explainability that can clarify how the model’s predictions are influenced by the relative significance of clinical features. Using the model and training data, we identified the global feature importance influencing survival predictions. To the best of our knowledge, this represents the first explainable, data-driven framework to support local therapy decision-making for patients with stage IV breast cancer.

In line with earlier research, such as that done by Li et al. (14), we discovered that patients with bone metastasis may benefit from local treatment. Compared with visceral metastasis, bone metastasis itself progresses slowly and can be prevented and treated with drugs such as zoledronic acid. A portion of symptomatic lesions can be improved by radiotherapy. In the prospective MF07-01 study (6), the enrolled population mainly included patients with bone metastasis and a small tumor burden. The 5-year OS rate after resection of the primary lesion was significantly improved. The results of subgroup analysis also indicated that patients with newly treated stage IV breast cancer with only bone metastasis can benefit from surgery that removes the primary lesion. We further found that patients with triple-negative breast cancer and bone metastasis could not benefit from local therapy. Triple-negative breast cancer (15) is a heterogeneous disease with no endocrine or targeted treatment options. Its clinical manifestations are the most aggressive among the subtypes of breast cancer, and it is associated with recurrence, metastasis, and poor outcomes. In contrast, we strongly believe that patients with non-triple-negative breast cancer and bone metastasis should receive local therapy to improve OS.

The model and training dataset were used to determine the global feature importance. The significance of the site of metastasis in the M1 patient population has been widely recognized in previous studies. A large number of studies have used bone metastasis as the reference, as patients with brain metastasis typically have the worst prognosis. This finding aligns with our results, which identified brain metastasis as the most significant factor influencing patient survival and decision-making regarding local therapy. The breast cancer subtype is also a critical factor. Previous research has repeatedly confirmed that breast cancer subtype were correlated with the selection of surgery type (12,13,16). The luminal A and luminal B subtypes are linked to a better prognosis, while the HER2-positive and triple-negative subtypes are associated with a worse prognosis (17-20). Similarly, in our study, the triple-negative and HER2-positive subtypes were found to be highly significant factors for patient survival. Triple-negative breast cancer is associated with poor survival outcomes. In contrast, patients with the HER2-positive subtype have a better prognosis than do those with the triple-negative subtype, which is attributable to the introduction of the first anti-HER2 monoclonal antibody, trastuzumab, marking the beginning of the targeted therapy era for breast cancer. A real-world study reported that the overall median OS of patients with advanced HER2-positive breast cancer increased from 2008 to 2017 (21). Compared to these four features, the other clinical characteristics investigated in our study were not found to be significant.

To improve the method’s relevance in real-world clinical applications, causal inference should be integrated into both the model training and interpretability processes (22). Enhancing the clarity of feature attributions could involve embedding causal model assumptions and applying sample reweighting within the loss function. This strategy could then be evaluated against the performance of our current deep learning model (22).

Conclusions

We developed and externally validated ML models that could accurately predict survival in patients with M1 breast cancer. Breast subtype, metastatic site, and surgery status were the most important factors for survival prediction in this population. The breast subtype and patients with metastasis to the bone may benefit from surgery, while those with metastasis to brain, lung, and liver may not.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-350/rc

Data Sharing Statement: Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-350/dss

Peer Review File: Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-350/prf

Funding: This research was supported by grants from The Science and Technology Elite Talent Project of Shaanxi Provincial People’s Hospital (No. 2021JY-39), The Science and Technology Development Incubation Fund project of Shaanxi Provincial People’s Hospital (Nos. 2020YXM-05 and 2023YJY-35), Natural Science Foundation of Shaanxi Province (No. 2024JC-YBQN-0988), and Xi’an Science and Technology Planning project (No. 2022JH- YBYJ-0170).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://gs.amegroups.com/article/view/10.21037/gs-2025-350/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments, and was approved by the Medical Ethics Committee of Shaanxi Provincial People’s Hospital (approval No. 2024-R703). These participants provided their written informed consent to participate in this study.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin 2019;69:7-34. [Crossref] [PubMed]
Barcenas CH, Song J, Murthy RK, et al. Prognostic Model for De Novo and Recurrent Metastatic Breast Cancer. JCO Clin Cancer Inform 2021;5:789-804. [Crossref] [PubMed]
Chen Y, Qiu Y, Shen H, et al. Development of prognostic models for HER2-positive metastatic breast cancer in females: a retrospective population-based study. BMC Womens Health 2024;24:675. [Crossref] [PubMed]
Dong ZX, Ou-Yang Y, Fang L, et al. Intratumoral disulfidptosis heterogeneity in triple-negative breast cancer, a multiomics integration analysis. Transl Cancer Res 2025;14:6653-66. [Crossref] [PubMed]
Gao HF, Lin YY, Lv XZ, et al. ErbB signaling and cell cycle pathways associated with trastuzumab deruxtecan resistance in HER2-positive metastatic breast cancer: a case report. AME Case Rep 2025;9:118. [Crossref] [PubMed]
Jin N, Tian M, Zha M, et al. Combined treatment of inetetamab plus pyrotinib and vinorelbine in managing advanced HER2-positive breast cancer patients (ILLUMINE): a multicenter, retrospective, real-world study. Transl Breast Cancer Res 2025;6:31. [Crossref] [PubMed]
Pons-Tostivint E, Alouani E, Kirova Y, et al. Is there a role for locoregional treatment of the primary tumor in de novo metastatic breast cancer in the era of tailored therapies?: Evidences, unresolved questions and a practical algorithm. Crit Rev Oncol Hematol 2021;157:103146. [Crossref] [PubMed]
Yu Y, Hong H, Wang Y, et al. Clinical Evidence for Locoregional Surgery of the Primary Tumor in Patients with De Novo Stage IV Breast Cancer. Ann Surg Oncol 2021;28:5059-70. [Crossref] [PubMed]
Soran A, Dogan L, Isik A, et al. The Effect of Primary Surgery in Patients with De Novo Stage IV Breast Cancer with Bone Metastasis Only (Protocol BOMET MF 14-01): A Multi-Center, Prospective Registry Study. Ann Surg Oncol 2021;28:5048-57. [Crossref] [PubMed]
Kingma DP, Ba J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015); 2015.
Rouzier R, Mathieu MC, Sideris L, et al. Breast-conserving surgery after neoadjuvant anthracycline-based chemotherapy for large breast tumors. Cancer 2004;101:918-25. [Crossref] [PubMed]
Tryfonidis K, Senkus E, Cardoso MJ, et al. Management of locally advanced breast cancer-perspectives and future directions. Nat Rev Clin Oncol 2015;12:147-62. [Crossref] [PubMed]
Li K, Zhou C, Yu Y, et al. Metastatic Pattern Discriminates Survival Benefit of Type of Surgery in Patients With De Novo Stage IV Breast Cancer Based on SEER Database. Front Surg 2021;8:696628. [Crossref] [PubMed]
Jiang Z, Ouyang Q, Sun T, et al. Toripalimab plus nab-paclitaxel in metastatic or recurrent triple-negative breast cancer: a randomized phase 3 trial. Nat Med 2024;30:249-56. [Crossref] [PubMed]
Li C, Liu M, Li J, et al. Machine learning predicts the prognosis of breast cancer patients with initial bone metastases. Front Public Health 2022;10:1003976. [Crossref] [PubMed]
Leone BA, Vallejo CT, Romero AO, et al. Prognostic impact of metastatic pattern in stage IV breast cancer at initial diagnosis. Breast Cancer Res Treat 2017;161:537-48. [Crossref] [PubMed]
Liu H, Kamarthi H, Kong L, et al. Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning. In Proceedings of the 41st International Conference on Machine Learning (PMLR 235). 2024; pp. 31312-25.
Naser MZ. Causality and causal inference for engineers: Beyond correlation, regression, prediction and artificial intelligence. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2024;14:e1533.
Bockstedt JC, Buckman JR. Humans’ use of ai assistance: The effect of loss aversion on willingness to delegate decisions. Management Science 2025;72:1-782. iv-vi.
Izmailov P, Podoprikhin D, Garipov T, et al. Averaging weights leads to wider optima and better generalization. In: Proceedings of the 34th International Conference on Machine Learning (ICML). 2018; PMLR 80: 876–885.
Jiao L, Wang Y, Liu X, et al. Causal Inference Meets Deep Learning: A Comprehensive Survey. Research (Wash D C) 2024;7:0467.

(English Language Editor: J. Gray)

Cite this article as: Jin L, Zhao Q, Fu S, Chao Z, Cao F, Wu J, Ma D, Zhu X, Zhang Y. Development and validation of explainable machine learning models for the prediction of survival in patients with M1 breast cancer. Gland Surg 2026;15(2):50. doi: 10.21037/gs-2025-350

Development and validation of explainable machine learning models for the prediction of survival in patients with M1 breast cancer

Highlight box

Introduction

Methods

Eligibility criteria and patient information

Explainable machine learning (ML) survival model design

Statistical analysis

Results

Patients’ baseline characteristics

Table 1

Survival feature importance

Survival analysis of the subgroups

Training curve and model performance

Table 2

Calibration curves

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share