Explainable machine learning based on habitat analysis of axillary lymph node ultrasound images for preoperatively noninvasive evaluation of axillary lymph node metastasis in breast cancer: a bi-center study

Mi Zhou; Yu-Long Tang; Yu-Yuan Chen; Fu-Li Chen; Qiaoxin Zhong; Shu-Ting Nie; Aijiao Yi; Bin Wang

doi:10.21037/gs-2026-0173

Original Article

Explainable machine learning based on habitat analysis of axillary lymph node ultrasound images for preoperatively noninvasive evaluation of axillary lymph node metastasis in breast cancer: a bi-center study

Mi Zhou^1#, Yu-Long Tang^2#, Yu-Yuan Chen¹, Fu-Li Chen¹, Qiaoxin Zhong³, Shu-Ting Nie¹, Aijiao Yi¹, Bin Wang¹

¹Department of Medical Ultrasound, Yueyang Central Hospital, Yueyang, China; ²Department of Thyroid Surgery, Hunan Cancer Hospital & The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China; ³Department of Artificial Intelligence, Julei Technology Company, Wuhan, China

Contributions: (I) Conception and design: ; (II) Administrative support: ; (III) Provision of study materials or patients: ; (IV) Collection and assembly of data: ; (V) Data analysis and interpretation: ; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Bin Wang, MD. Professor of Medicine, Department of Medical Ultrasound, Yueyang Central Hospital, 39 DongMaoLing Road, Yueyang 414000, China. Email: wangb58@mail3.sysu.edu.cn.

Background: Accurate evaluation of axillary lymph node (ALN) is crucial for guiding staging and treatment strategies in breast cancer (BC) patients. This study aimed to develop an optimal machine learning model for predicting ALN status by utilizing both conventional radiomics and habitat analysis based on axillary B-mode ultrasound (BMUS) images, offering a powerful, non-invasive means to quantify and map ALN heterogeneity, providing deeper insights into biological behavior of ALN metastasis.

Methods: This study retrospectively analyzed the preoperative BMUS images of ALNs in 297 patients with BC from Hunan Cancer Hospital and Yueyang Central Hospital. Patients were divided into training (n=172), test (n=45), and external validation (n=80) sets. Habitat features were segmented into sub-regions using K-means clustering. Both habitat and conventional radiomics models were developed using twelve widely adopted machine learning algorithms respectively, including least absolute shrinkage and selection operator (LASSO), the support vector machine (SVM), K-nearest neighbors (KNN), random forest (RF), ExtraTrees, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), naive Bayes (NB), adaptive boosting (AdaBoost), gradient boosting (GB), logistic regression (LR) and multilayer perceptron (MLP). Model performance was assessed using receiver operating characteristic (ROC) curve, calibration curve, and decision curve analysis (DCA). SHapley Additive exPlanations (SHAP) values were used to identify key features and enhance model transparency and interpretability.

Results: Most habitat models exhibited better performance with higher area under the curve (AUC) values in external validation sets compared to conventional radiomics models. A comparative analysis of the diagnostic efficacy of these models demonstrated that the habitat-based MLP model exhibited superior efficacy, achieving the highest AUC value of 0.913 [95% confidence interval (CI): 0.842–0.967], an accuracy of 87.62% (95% CI: 79.97–93.75%), and an F1 score of 0.84 (95% CI: 0.73–0.93) in the external validation set. SHAP provided further insight into the contributions of each feature to the model's outcomes.

Conclusions: We developed and validated machine learning models utilizing habitat-based ALN ultrasound images, demonstrating outstanding predictive performance for ALN metastasis in BC compared to conventional radiomics models. Among all machine learning models, the habitat-based MLP model showed the best predictive efficacy. The prediction process was visualized using SHAP, holding promise as a non-invasive tool for preoperative assessment of ALN status and potentially supporting surgeons in developing evidence-based, risk-stratified surgical strategies.

Keywords: Habitat analysis; radiomics; breast cancer (BC); machine learning; axillary lymph node (ALN)

Submitted Mar 20, 2026. Accepted for publication May 13, 2026. Published online May 27, 2026.

doi: 10.21037/gs-2026-0173

Highlight box

Key findings

•(Report here about the key findings of the study)

What is known and what is new?

•(Report here about what is known)

•(Report here about what this manuscript adds)

What is the implication, and what should change now?

•(Report here about the implications and actions needed)

Introduction

Axillary lymph node metastasis (ALNM), the predominant site of spread in breast cancer (BC), represents a crucial independent prognostic indicator (1). Accurately evaluating axillary lymph node (ALN) status is essential for guiding both the staging and treatment strategies applied to BC patients (1-3). Results from the ACOSOG Z0011 trial indicated that survival was not adversely affected by avoiding ALN dissection (ALND) in early-stage BC patients harboring fewer than three metastatic ALNs. Thus, accurate preoperative assessment of ALN status in patients with BC could potentially obviate the need for ALND in selected cases (4,5), thereby minimizing unnecessary invasive exploration, reducing postoperative complications and improving prognosis.

B-mode ultrasound (BMUS) offers a radiation-free, cost-effective modality for real-time evaluation of lymph node morphology, including cortical thickness and hilar architecture, which is widely used to preoperatively evaluate the ALN status for patients with BC (6). However, a substantial proportion of BMUS-negative lesions were proved to be malignant upon pathological examination (7,8), confirming that using BMUS alone was insufficient for preoperatively accurate evaluation of ALN status (9). Thus, there is an urgent need to develop a more reliable assessment method.

Radiomics represents an advanced analytical technology that enables high-throughput extraction of quantitative imaging features, thereby enhancing the diagnostic accuracy of disease (10,11). However, the scale and complexity of radiomic features pose significant challenges for manual classification. Machine learning algorithms provide powerful tools for diagnosis and treatment by simulating human cognition for intelligent learning and processing (12,13). However, conventional radiomics-based ultrasound is always conducted for the whole lesion, which overlooks important regional variations and might fail to capture underlying biological complexity in the lesion.

Habitat represents a sophisticated radiomics approach that employs lesion sub-regional analysis, in which lesions are partitioned into biologically distinct voxel clusters based on phenotypic similarity, rather than treating the lesion as a homogeneous entity. These phenotypically distinct subregions are hypothesized to reflect underlying biological heterogeneity, such as variations in cellularity, vascularity, and necrosis (14,15). By extracting and analyzing features from these individual sub-regions, habitat imaging offers a powerful, non-invasive means to quantify and map intralesional heterogeneity, providing deeper insights into nodal lesion biology beyond what conventional radiomics can achieve (14). In a previous study, our research group developed machine learning models based on multi-modal ultrasound with habitat analysis to enhance the prediction of ALN burden in BC patients (15), our findings demonstrate the feasibility and diagnostic potential of applying habitat analysis to ultrasound imaging.

In this study, we sought to develop an optimized machine learning model for predicting ALN status in BC by integrating conventional and habitat-derived radiomic features from axillary BMUS images. Our findings provide foundational insights for advancing habitat analysis and establishing a non-invasive diagnostic tool to discriminate metastatic from benign ALNs. We present this article in accordance with the TRIPOD reporting checklist (available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0173/rc).

Methods

This retrospective study was approved by the Ethics Committee of Yueyang Central Hospital (approval No.). The other institution was informed of and agreed to the study. The ethics committee waived the requirement for written informed consent for participation. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Patient

We enrolled a total of 297 patients from two medical centers: 217 cases were collected at Hunan Cancer Hospital (Center 1) from March 2018 to March 2023, and 80 cases were obtained from Yueyang Central Hospital (Center 2) between August 2019 and May 2025 (Figure 1). The inclusion criteria were listed as follows: (I) all patients were diagnosed with BC through surgical pathology, and ALN status was confirmed by ALN biopsy or ALND; (II) ultrasound examinations were conducted within two weeks prior to lymph node biopsy or surgery. The exclusion criteria were listed as follows: (I) prior radiotherapy or chemotherapy before ultrasound examination; (II) unsatisfying BMUS image quality of ALN; (III) previous ALN biopsy prior to ultrasound assessment.

Figure 1 Study enrollment workflow of this study. ALN, axillary lymph node.

Ultrasound examinations and images acquisition

Ultrasound examinations were performed by Aixplorer ultrasound system (SuperSonic Imaging, France) with L15-4 or L10-2 linear array transducers. All patients underwent axillary BMUS for ALN screening. Suspicious ALNs were identified based on established criteria (16,17), including: (I) diffuse/eccentric cortical thickening (>3 mm) or focal cortical bulge; (II) rounded hypoechoic morphology; (III) complete/partial fatty hilum effacement; (IV) non-hilum blood flow on Doppler imaging; (V) nodal replacement by ill-defined/irregular masses; or (VI) intranodal microcalcifications. For each patient, a single target axillary lymph node (TALN) was selected from ipsilateral nodes based on BMUS morphological characteristics. Selection priority was given to nodes demonstrating complete or partial replacement by ill-defined/irregular masses. In the absence of such lesions, preference was given to rounded hypoechoic nodes, followed by nodes exhibiting maximal cortical thickness. After TALN biopsy, patients with positive findings were considered metastatic, whereas for patients with negative findings, the biopsy result alone was not conclusive; all of these patients subsequently underwent ALND with pathological examination of the entire specimen to confirm the absence of nodal metastasis, thereby eliminating the risk of misclassifying false-negative results as true negatives.

Segmentation and habitat generation

In this study, the BMUS image of each ALN were used for conventional radiomics habitat analysis. Regions of interest (ROIs) of the BMUS images were manually delineated using ITK-SNAP software (3.8.0; http://www.itksnap.org). Figure 2 shows the data flowchart to outline the processes involved in image processing.

Figure 2 The overall radiomics and habitat analysis workflow of this study are as follows: (I) imaging segmentation, (II) habitat generation, (III) feature extraction, (IV) feature selection, (V) model construction and evaluation and (VI) clinical use. AdaBoost, adaptive boosting; AUC, area under the curve; DCA, decision curve analysis; KNN, k-nearest neighbors; LASSO, least absolute shrinkage and selection operator; LightGBM, light gradient boosting machine; MLP, multilayer perceptron; MSE, mean squared error; PCA, principal component; PCA, principal component analysis; ROI, region of interest; SVM, support vector machine; XGBoost, extreme gradient boosting.

Feature extraction and selection

Voxel-wise local feature extraction and quantification

Unlike conventional radiomics approaches that extract features from the entire lymph node region at once, our method focuses on characterizing each individual voxel within it by computing a set of localized descriptors. For each voxel, we define a 2D neighborhood window (25×25 pixels) centered at the voxel of interest and extract a 10-dimensional feature vector that captures local texture, patterns, and microstructural properties. The feature vector comprises the following components: (I) second-order and entropy features: these features quantify spatial relationships between pixel intensities (i.e., textural information) using the Gray-Level Co-occurrence Matrix (GLCM). Specifically, we compute: Contrast: measures local intensity variations. Dissimilarity: quantifies the degree of disparity in neighboring pixel pairs. Homogeneity: reflects the uniformity of texture. Energy [angular second moment (ASM)]: indicates the orderliness of pixel distribution. Correlation: assesses linear dependencies between adjacent pixels. Additionally, Shannon Entropy is calculated to evaluate the complexity and uncertainty of local texture patterns. (II) Pattern and morphological features: mean and standard deviation (SD) of local binary patterns (LBP): LBP is a powerful texture descriptor that encodes microstructures (e.g., edges, corners, and flat regions) by comparing the central voxel with its neighbors. The mean and SD of LBP maps summarize the overall distribution of these microstructural patterns. Mean intensity after morphological closing: by applying a morphological closing operation (dilation followed by erosion), small gaps and holes in bright regions are filled. The mean intensity of the post-closing image reflects structural integrity and “solidity” within the local region. Through this process, each voxel within the lesion is transformed into a high-dimensional feature vector that comprehensively characterizes its local texture, patterns, and microstructural properties.

Construction and normalization of the global feature matrix

All voxel-wise feature vectors from both the lesion and surrounding regions were aggregated to construct a high-dimensional global feature matrix. In this matrix, each row represents an individual voxel, while each column corresponds to one of the ten extracted features (as detailed in the previous section). To mitigate the influence of feature scale variations, Min-Max normalization was applied, rescaling each feature to a standardized range of [0, 1].

K-means clustering and sub-region definition

Using the standardized global feature matrix as input, the K-means clustering algorithm is applied. This algorithm automatically groups voxels that are similar in feature space—specifically, those exhibiting comparable local texture, pattern, and microstructure—into the same cluster. The optimal number of clusters was determined by the Calinski-Harabasz (CH) score (18), with clusters ranging from 2 to 10. The intranodal region was divided into sub-regions based on optimal k values. Following clustering, each voxel is assigned a cluster label. Voxels sharing the same label constitute a sub-region characterized by a high degree of feature similarity.

Feature engineering in sub-regions

Feature extraction: using the Pyradiomics tool, a large number of radiomics features—including shape, first-order statistics, and texture features—were independently extracted from each sub-region in the BMUS. Features extracted from each sub-region were integrated, with missing values imputed using k-nearest neighbors (KNN) algorithm during the fusion process.

Feature selection: to select the most effective and non-redundant feature combinations from the vast pool of extracted features, a series of feature selection procedures was implemented: (I) correlation analysis: remove redundant features with high correlation (correlation coefficient >0.95). (II) least absolute shrinkage and selection operator (LASSO) regression: as the final selection step, LASSO regression automatically selects the most predictive features and constructs a radiomics signature through the incorporation of a penalty term.

Prediction model development and validation

To comprehensively validate the effectiveness of the method, two prediction models were compared: the conventional radiomics model, which was constructed using features extracted from the entire lymph node and serves as a benchmark for traditional approaches; and the habitat analysis model, which utilizes only the features extracted from the intranodal habitat. The latter serves as the core approach for evaluating the innovative contribution of this study. These two ALNM prediction models were formulated by employing twelve commonly utilized machine learning models: LASSO, the support vector machine (SVM), KNN, random forest (RF), ExtraTrees, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), naive Bayes (NB), adaptive boosting (AdaBoost), gradient boosting (GB), logistic regression (LR) and multilayer perceptron (MLP). Each of these models was trained using the selected features, and also underwent testing and external validation.

Prediction model evaluation and SHapley Additive exPlanations (SHAP) interpretability analysis

A comprehensive evaluation framework was implemented, consisting of the following components: (I) receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC) value: Evaluate the model’s discriminative performance; (II) calibration curve: evaluate the accuracy of the model’s predicted probabilities; (III) decision curve analysis (DCA): evaluate the clinical utility and net benefit of the model; (IV) SHAP (19): explains model predictions by calculating the contribution of each feature to individual predictions. SHAP values help identify key features, improve model transparency, and enhance interpretability.

Statistical analysis

Statistical analyses were performed using R software (version 3.6.1) and SPSS (version 23.0, SPSS Inc.). Continuous variables are presented as mean ± SD and were analyzed using independent t-tests, while categorical variables were compared using χ² tests or Fisher’s exact tests. The training cohort served to develop machine learning-based ultrasound radiomics/habitat models for ALN metastasis prediction, with the test set used for performance assessment and the external validation set used for independent validation. Model performance was assessed using accuracy, sensitivity (SEN), specificity (SPE), F1-score and AUC. A two-sided P value <0.05 was considered statistically significant.

Results

Patients’ characteristics

A total of 297 patients were enrolled in this study. Among them, 217 patients from Center 1 were randomly assigned to the training set (n=172) and the test set (n=45) at a ratio of 8:2 with a median age of 54 years [interquartile range (IQR), 44–59 years]. Additionally, 80 patients from Center 2 were included in the external validation set with a median age of 50 years (IQR, 41–53 years).

Feature selection and selection

In the radiomics signature, a comprehensive set of 288 radiomic features was extracted. For habitat-based radiomics signatures, the features were extracted from each sub-region. The optimal number of clusters was determined to be four, as indicated by the CH score (Figure 3), resulting in four sub-regions for each BMUS. A total of 1,152 radiomic features were extracted from each habitat. Data standardization was carried out using z-score normalization, and SMOTE sampling was applied to address data imbalance. Pearson correlation analysis was performed with a threshold of 0.95, reducing the feature set to 412. The final selection of features for constructing the radiomic signature was determined through the LASSO regression model which conducted to select the most valuable features and build the radiomics model, resulting in the final selection of 28 features. Subsequently, these 28 habitat features were used to construct the habitat analysis model, whereas the conventional radiomics model was developed using features extracted from the entire lymph node region. The optimal λ value was identified through fivefold cross-validation, selecting the value corresponding to the lowest mean standard error (Figure 4). Conventional radiomics models and habitat models were conducted across training, test, and external validation sets. The ﬁnal results were selected based on performance in the external validation set.

Figure 3 Plots of cluster number evaluation.

Figure 4 LASSO regression: the final selection process of automatically selecting the most valuable features for prediction and constructs a radiomics signature through the penalty term. (A,D) The coefficients obtained from LASSO during fivefold cross-validation; (B,E) the MSE path; (C,F) the retained features after performing feature dimension reduction in the predictive model. The top and bottom sections depict the conventional and habitat radiomic methodologies, respectively. 2D, two-dimensional; LASSO, least absolute shrinkage and selection operator; MCC; MSE, mean squared error.

Model performance

Twelve ML algorithms were employed for the development of the conventional radiomics models and the habitat analysis models. Confusion matrices were applied to the training, testing and validation sets (Figure 5). Tables 1,2 list the diagnostic capabilities of these twelve different models respectively in conventional and habitat radiomic in the external validation set, including AUC, accuracy, SEN, SPE and F1 scores. F1-score represents the harmonic mean of precision and recall. A higher value indicates better overall classification performance.

Figure 5 Confusion matrix yielded by the habitat-based MLP model in the training, testing and external validation sets, respectively. MLP, multilayer perceptron.

Table 1

Comparison of the AUC values and predictive efficacy metrics in the external validation set yielded by the conventional radiomics models

Model	AUC	Accuracy (%)	Sensitivity (%)	Specificity (%)	F1-score
LASSO	0.802 (0.702–0.889)	69.92 (58.75–80.00)	90.28 (78.12–100.00)	57.20 (43.39–70.84)	0.70 (0.56–0.81)
SVM	0.762 (0.652–0.857)	77.62 (68.75–86.25)	41.94 (25.71–60.00)	100.00 (100.00–100.00)	0.59 (0.41–0.75)
KNN	0.775 (0.666–0.875)	73.88 (63.75–82.53)	58.11 (40.61–76.00)	83.80 (72.41–93.75)	0.63 (0.47–0.77)
RF	0.897 (0.820–0.957)	80.11 (70.00–88.75)	90.28 (78.12–100.00)	73.76 (61.40–85.71)	0.78 (0.65–0.88)
Extra Trees	0.812 (0.698–0.903)	77.71 (68.72–86.25)	61.61 (43.32–79.50)	87.77 (78.00–96.00)	0.68 (0.51–0.82)
XGBoost	0.828 (0.721–0.917)	74.88 (65.00–83.75)	86.98 (73.33–96.87)	67.30 (53.19–80.39)	0.72 (0.59–0.83)
LightGBM	0.799 (0.692–0.894)	73.81 (64.97–82.50)	83.90 (69.23–96.30)	67.49 (54.24–80.39)	0.71 (0.57–0.82)
NB	0.758 (0.661–0.848)	71.23 (61.25–81.25)	90.44 (77.78–100.00)	59.23 (46.00–72.00)	0.70 (0.58–0.81)
AdaBoost	0.803 (0.697–0.892)	76.16 (66.25–85.00)	83.74 (70.00–96.00)	71.47 (58.00–84.31)	0.73 (0.60–0.83)
GB	0.810 (0.704–0.899)	73.83 (63.75–82.50)	87.22 (73.08–97.06)	65.42 (51.92–78.72)	0.72 (0.59–0.82)
LR	0.849 (0.761–0.923)	77.52 (67.50–86.25)	87.16 (73.33–96.97)	71.52 (58.49–83.93)	0.75 (0.62–0.85)
MLP	0.807 (0.705–0.895)	74.03 (63.75–83.75)	87.31 (74.19–97.06)	65.72 (52.00–79.17)	0.72 (0.59–0.83)

95% confidence intervals are provided in parentheses. F1-score represents the harmonic mean of precision and recall. A higher value indicates better overall classification performance. AdaBoost, adaptive boosting; AUC, area under the curve; GB, gradient boosting; KNN, k-nearest neighbors; LASSO, least absolute shrinkage and selection operator; LightGBM, light gradient boosting machine; LR, logistic regression; MLP, multilayer perceptron; NB, naive Bayes; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.

Table 2

Comparison of the AUC values and predictive efficacy metrics in the external validation set yielded by the habitat models

Model	AUC	Accuracy (%)	Sensitivity (%)	Specificity (%)	F1-score
LASSO	0.909 (0.840–0.964)	85.08 (76.25–92.50)	80.69 (65.38–92.86)	87.81 (77.78–96.00)	0.80 (0.68–0.90)
SVM	0.837 (0.742–0.918)	78.76 (70.00–86.25)	74.10 (58.62–88.89)	81.68 (70.36–91.67)	0.73 (0.60–0.85)
KNN	0.788 (0.685–0.882)	78.92 (70.00–87.50)	64.67 (48.27–80.00)	87.82 (77.78–96.00)	0.70 (0.56–0.83)
RF	0.782 (0.670–0.880)	72.63 (61.25–82.50)	87.14 (74.07–97.06)	63.57 (49.02–76.93)	0.71 (0.58–0.82)
Extra Trees	0.832 (0.735–0.923)	81.27 (72.50–88.75)	77.26 (62.85–90.48)	83.76 (73.47–93.48)	0.76 (0.63–0.86)
XGBoost	0.853 (0.752–0.931)	81.40 (72.50–90.00)	67.63 (50.00–83.33)	90.01 (80.70–97.68)	0.73 (0.59–0.86)
LightGBM	0.878 (0.790–0.952)	83.76 (75.00–91.25)	90.26 (78.57–100.00)	79.70 (68.00–90.39)	0.81 (0.69–0.91)
NB	0.882 (0.797–0.947)	82.55 (73.75–90.00)	80.55 (65.38–93.55)	83.76 (72.88–93.62)	0.78 (0.66–0.88)
AdaBoost	0.786 (0.678–0.882)	77.80 (68.75–87.50)	55.19 (36.67–72.74)	91.95 (83.72–98.11)	0.65 (0.48–0.79)
GB	0.816 (0.714–0.903)	79.00 (68.75–88.75)	74.31 (59.26–88.46)	81.92 (70.45–92.31)	0.73 (0.59–0.85)
LR	0.888 (0.812–0.953)	83.87 (75.00–91.25)	90.51 (78.56–100.00)	79.70 (68.18–91.11)	0.81 (0.69–0.91)
MLP	0.913 (0.842–0.967)	87.62 (79.97–93.75)	87.31 (74.19–97.06)	87.79 (77.36–96.00)	0.84 (0.73–0.93)

95% confidence intervals are provided in parentheses. F1-score represents the harmonic mean of precision and recall. A higher value indicates better overall classification performance. AdaBoost, adaptive boosting; AUC, area under the curve; GB, gradient boosting; KNN, k-nearest neighbors; LASSO, least absolute shrinkage and selection operator; LightGBM, light gradient boosting machine; LR, logistic regression; MLP, multilayer perceptron; NB, naive Bayes; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.

Most habitat signature models exhibited better performance, as indicated by higher AUC values. A comparative analysis of the diagnostic efficacy of these models revealed that the MLP model in habitat signature models outperformed all others with the highest AUC value, accuracy and F1 scores (Figure 6).

Figure 6 The ROC curves of twelve different machine learning models for the external validation set based on conventional and habitat radiomic signatures, respectively. AdaBoost, adaptive boosting; AUC, area under the curve; KNN, k-nearest neighbors; LASSO, least absolute shrinkage and selection operator; LightGBM, light gradient boosting machine; MLP, multilayer perceptron; ROC, receiver operating characteristic; SVM, support vector machine; XGBoost, extreme gradient boosting.

Model evaluation and interpretation

Calibration curves and DCA

The calibration performance of each predictive model was evaluated in the training, test, and external validation sets. We presented the calibration curves of the best-performing models in habitat omics. In the calibration plot of the validation set (Figure 7A-7C), the MLP model in habitat omics exhibited calibration curves that aligned with the diagonal line across most probability ranges, indicating a generally good agreement between predicted probabilities and observed outcomes.

Figure 7 Calibration curves (A-C) and decision curve analysis (D-F) for the habitat-based MLP model in the training, testing and external validation sets, respectively. MLP, multilayer perceptron.

DCA was employed to evaluate the clinical utility of the predictive signatures by considering the potential benefits and harms of applying a model in decision-making This assessment considers the net benefit of applying each model over a range of threshold probabilities relevant to clinical decision-making. The habitat-based MLP model demonstrated good potential for achieving net benefit, highlighting its effectiveness in clinical decision-making scenarios (Figure 7D-7F).

SHAP analysis

SHAP analysis was performed to evaluate the importance of each feature in the MLP model. The SHAP value scatter plot (Figure 8) demonstrates the directional influence of individual features on model predictions, where red hues denote positive contributions (increasing predicted probability) and blue hues represent negative contributions (decreasing predicted probability). Among all features, the wavelet_H_gldm_DependenceVariance_h1 had the strongest influence on the model’s decision-making. The density plot of wavelet_H_gldm_DependenceVariance_h1 showed variation in SHAP values across the set, with the color gradient indicating that the model’s output increased as the wavelet_H_gldm_DependenceVariance_h1 value decreased.

Figure 8 SHAP summary plots of the habitat-based MLP model in testing set. 2D, two-dimensional; MCC; MLP, multilayer perceptron; SHAP, SHapley Additive exPlanations.

Discussion

In this study, we established both habitat and conventional radiomics models to assess ALN status in BC patients through machine learning. Most habitat models exhibited better performance with higher AUC values in external validation sets compared to conventional radiomics models. A comparative analysis of the diagnostic efficacy of these models demonstrated that the MLP model based on habitat signature exhibited superior efficacy, achieving the highest AUC value of 0.913 [95% confidence interval (CI): 0.842–0.967], an accuracy of 87.62% (95% CI: 79.97–93.75%), and an F1 score of 0.84 (95% CI: 0.73–0.93) in the validation set. Moreover, we found classification count of 4 is the most optimal for clustering analysis, which might enhance the understanding of heterogeneity in ALN metastasis process. Our study demonstrates outstanding performance, and the robust generalizability of the model is further evidenced by its validation using an independent external set from a distinct center.

ALN status is essential for staging of BC, and affecting clinical treatment strategies. Based on the results of the ACOSOG Z0011 trial, the clinical practice guidelines were updated to state that women with early-stage primary invasive BC with fewer than three metastatic ALNs who undergoing breast-conserving surgery with whole-breast radiotherapy no longer need ALND (4,20,21). Thus, accurate preoperative lymph node assessment is of paramount importance in clinical practice.

In recent years, radiomics and deep learning-based radiomics have emerged as critical research directions in medical imaging, demonstrating significant potential for clinical translation and decision support (22). Zhou et al. (23) extracted 9 features from the gray-scale ultrasound image of each breast lesion to predict ALN metastasis and successfully constructed a model to outperform US-reported ALN status performed by the radiologist. Zhang et al. (24) built a radiomics nomogram with clinical factors to demonstrate favorable predictive efficacy for the response of node-positive BC to neoadjuvant chemotherapy. Ma et al. (25) developed and validated an automated breast volume ultrasound (ABVS)-based nomogram to assess ALN metastasis in axillary ultrasound-negative early BC, achieving an AUC of 0.768 and further supporting the wide applicability of ultrasound radiomics across various clinical contexts. However, these studies mainly focused on breast lesions to predict ALN status, which might lack the capability to directly determine metastatic status at the level of specific lymph nodes.

Previous studies utilizing radiomics based on ALN ultrasound images for ALN metastasis classification remain limited. Tang et al. (26) developed radiomics model performed better than two experts’ prediction of ALN status in both cohorts, with AUC of 0.929 and 0.919 in training and validation cohorts respectively. However, conventional radiomics approaches have predominantly relied on whole-lesion or whole-lymph node feature extraction, failing to account for the intrinsic spatial heterogeneity of breast tumors and ALNs. In contrast, habitat analysis enables identification of distinct sub-regions characterized by homogeneous imaging phenotypes. These spatially defined habitats may correspond to unique tumor biological profiles, providing more accurate characterization of tumor microenvironment heterogeneity and metastatic potential. In our prior work (15), we developed machine learning models leveraging shear wave elastography (SWE) images combined with habitat analysis to quantify intratumoral heterogeneity in breast lesions. This approach enabled spatial visualization of tumor heterogeneity and improved prediction of ALN burden in BC patients, demonstrating the clinical potential of ultrasound-based habitat analysis. Thus, habitat analysis might be helpful to visualize the heterogeneity of specific lymph nodes and non-invasive evaluate ALN status before surgery.

The metastatic cascade in lymph nodes initiates with tumor cell transport through afferent lymphatic vessels into the cortical region, followed by characteristic accumulation at the marginal sinus and subsequent drainage via hilar efferent vessels (27). This pathophysiology generates distinct spatial patterns, including progressive cortical thickening due to metastatic infiltration, a distinctive periphery-to-center (centripetal) spread pattern, development of regional intranodal heterogeneity rather than homogeneous involvement. Conventional radiomics approaches often overlook this critical spatial heterogeneity (28). In contrast, the metastatic cascade in lymph nodes and pathophysiology process provide theoretical basis for habitat analysis.

Our study introduces a novel method for capturing and characterizing ALN heterogeneity based on BMUS images. Utilizing the K-means algorithm—an unsupervised partitioning clustering technique that optimizes spatial coherence through iterative distance minimization—we systematically segmented each ALN into four biologically distinct sub-regions (29). The K-means clustering algorithm serves as a widely adopted computational tool in habitat analysis for delineating spatially and biologically distinct intranodal sub-regions. These sub-regions capture essential facets of lymph node heterogeneity, which emerge through dynamic interactions between micro-environmental selection pressures and clonal-specific cellular fitness adaptations, ultimately driving phenotypical and functional diversification within the intranodal ecosystem (30). These phenotypically distinct sub-regions are mechanistically linked to fundamental biological heterogeneity—such as differences in cellularity, vascularity, and necrosis. This biological grounding likely explains the robust predictive performance demonstrated by the habitat analysis model developed in this study (Figure 9). Moreover, habitat analysis of BMUS images enables direct visualization of ALN heterogeneity through quantitative sub-region mapping. Notably, a reduced number of cortical sub-regions might correlate with diminished ALN heterogeneity—a phenotypic signature that might serve as a non-invasive indicator of benign ALN status.

Figure 9 Habitat analysis of ALN was performed for two HER2-positive breast cancer patients. (A) Native BMUS of Patient 1. (B) Habitat segmentation overlay from (A). (C) Native BMUS of Patient 2. (D) Habitat mapping derived from (C). (A,C) BMUS revealed an elliptical-shaped, sharp-bordered lymph node with diffuse cortical thickness, the longitudinal-to-transverse (L/T) ratio >2, which was diagnosed as suspiciously metastatic ALN. (B) Habitat analysis revealed markedly higher heterogeneity in the cortical region. The final pathological result was metastatic ALN. (D) The cortical region was homogeneous color, indicated low heterogeneity. The final pathological result was benign ALN. ALN, axillary lymph node; BMUS, B-mode ultrasound; HER2, human epidermal growth factor receptor 2.

In this study, we established twelve machine learning models based on BMUS images habitat, the MLP model based on BMUS images habitat analysis was found to be best with the highest AUC value of 0.913 (95% CI: 0.842–0.967) in the validation set, which outperformed other machine learning models for the prediction ALN metastasis in patients with BC. Our study suggested that MLP was a robust algorithm and had strong generalization power to build habitat-based ultrasound models, provided a promising tool for predicting ALN metastasis in patients with BC.

In previous studies, the inherent opacity of traditional machine learning architectures has historically hindered clinical adoption due to the inability to trace decision-making pathways to biologically or clinically meaningful rationales (31). To the best of our knowledge, this is the first study to use the SHAP method to evaluate the ALN metastasis prediction ability of BC patients with habitat analysis based on ALN ultrasound images. In this study, the most significant features for predicting ALN status was wavelet_H_gldm_DependenceVariance_h1. This interpretable machine learning model provide a more comprehensive understanding of ALN metastasis biology, increase the transparency in facilitating trust among clinicians and assisting them in comprehending model predictions for informed decision-making.

Several methodological and conceptual limitations of this study should be acknowledged. First, the current analysis was restricted to BMUS images evaluation, without incorporating complementary functional imaging modalities such as SWE for tissue stiffness quantification or contrast-enhanced ultrasound (CEUS) for micro-vascular perfusion characterization, which might provide additional discriminative value in assessing ALN metastatic potential. Second, the manual segmentation of region of interest contours (RC) remains a time-intensive and operator-dependent process. Future studies should prioritize: (I) expansion of sample sizes to enhance statistical power; (II) implementation of deep learning-based automated segmentation architectures (e.g., nnUNet or Swin-Transformer) to improve reproducibility and throughput in clinical workflows; (III) SHAP analysis offered preliminary feature importance rankings for radiomics-lymph node heterogeneity correlations in our machine learning framework, translating these computational insights into biologically meaningful explanations—particularly for complex deep neural networks (DNNs) operating on medical images—poses persistent challenges. Finally, future investigations should prioritize multi-omics integration to elucidate the mechanistic linkages between radiomic phenotypes and their underlying molecular drivers—particularly through systematic correlation of imaging heterogeneity patterns with genomics and proteomics.

Conclusions

We developed and validated machine learning models utilizing habitat-based ALN ultrasound images, which demonstrated outstanding predictive performance for ALN metastasis in BC compared to conventional radiomics models. In all machine learning models, the habitat-based MLP model showed best predictive efficacy. We could visualize the prediction process using SHAP. This approach holds promise as a non-invasive tool for preoperative assessment of ALN status, potentially supporting surgeons in developing evidence-based, risk-stratified surgical strategies.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0173/rc

Data Sharing Statement: Available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0173/dss

Peer Review File: Available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0173/prf

Funding: This work was supported by the general funding project of Hunan Provincial Health Commission (No. B202309029532) and the Natural Science Foundation of Hunan Province (No. 2023JJ50304).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://gs.amegroups.com/article/view/10.21037/gs-2026-0173/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This retrospective study was approved by the Ethics Committee of Yueyang Central Hospital (approval No.). The other institution was informed of and agreed to the study. The ethics committee waived the requirement for written informed consent for participation. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Chang JM, Leung JWT, Moy L, et al. Axillary Nodal Evaluation in Breast Cancer: State of the Art. Radiology 2020;295:500-15. [Crossref] [PubMed]
Bartels SAL, Donker M, Poncet C, et al. Radiotherapy or Surgery of the Axilla After a Positive Sentinel Node in Breast Cancer: 10-Year Results of the Randomized Controlled EORTC 10981-22023 AMAROS Trial. J Clin Oncol 2023;41:2159-65. [Crossref] [PubMed]
Johnston SRD, Toi M, O'Shaughnessy J, et al. Abemaciclib plus endocrine therapy for hormone receptor-positive, HER2-negative, node-positive, high-risk early breast cancer (monarchE): results from a preplanned interim analysis of a randomised, open-label, phase 3 trial. Lancet Oncol 2023;24:77-90. [Crossref] [PubMed]
Giuliano AE, Ballman KV, McCall L, et al. Effect of Axillary Dissection vs No Axillary Dissection on 10-Year Overall Survival Among Women With Invasive Breast Cancer and Sentinel Node Metastasis: The ACOSOG Z0011 (Alliance) Randomized Clinical Trial. JAMA 2017;318:918-26. [Crossref] [PubMed]
Lyman GH, Somerfield MR, Bosserman LD, et al. Sentinel Lymph Node Biopsy for Patients With Early-Stage Breast Cancer: American Society of Clinical Oncology Clinical Practice Guideline Update. J Clin Oncol 2017;35:561-4. [Crossref] [PubMed]
Senkus E, Kyriakides S, Penault-Llorca F, et al. Primary breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol 2013;24:vi7-23. [Crossref] [PubMed]
Özler İ, Aydin H, Güler OC, et al. Can preoperative axillary ultrasound and biopsy of suspicious lymph nodes be an alternative to sentinel lymph node biopsy in clinical node negative early breast cancer? Int J Clin Pract 2021;75:e14332. [Crossref] [PubMed]
Jiang M, Li CL, Luo XM, et al. Radiomics model based on shear-wave elastography in the assessment of axillary lymph node status in early-stage breast cancer. Eur Radiol 2022;32:2313-25. [Crossref] [PubMed]
Wang B, Yang J, Tang YL, et al. The value of microvascular Doppler ultrasound technique, qualitative or quantitative shear-wave elastography of breast lesions for predicting axillary nodal burden in patients with breast cancer. Quant Imaging Med Surg 2024;14:408-20. [Crossref] [PubMed]
Lambin P, Leijenaar RTH, Deist TM, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 2017;14:749-62. [Crossref] [PubMed]
Liu Z, Wang S, Dong D, et al. The Applications of Radiomics in Precision Diagnosis and Treatment of Oncology: Opportunities and Challenges. Theranostics 2019;9:1303-22. [Crossref] [PubMed]
Whitney HM, Taylor NS, Drukker K, et al. Additive Benefit of Radiomics Over Size Alone in the Distinction Between Benign Lesions and Luminal A Cancers on a Large Clinical Breast MRI Dataset. Acad Radiol 2019;26:202-9. [Crossref] [PubMed]
Li J, Qiao H, Wu F, et al. A novel hypoxia- and lactate metabolism-related signature to predict prognosis and immunotherapy responses for breast cancer by integrating machine learning and bioinformatic analyses. Front Immunol 2022;13:998140. [Crossref] [PubMed]
Wu PQ, Guo FL, Wang J, et al. Development and validation of a dynamic contrast-enhanced magnetic resonance imaging-based habitat and peritumoral radiomic model to predict axillary lymph node metastasis in patients with breast cancer: a retrospective study. Quant Imaging Med Surg 2024;14:8211-26. [Crossref] [PubMed]
Xu J, Qi P, Ou X, et al. Bi-modal ultrasound radiomics and habitat analysis enhanced the pre-operative prediction of axillary lymph node burden in patients with early-stage breast cancer. Front Oncol 2025;15:1607442. [Crossref] [PubMed]
Abe H, Schacht D, Kulkarni K, et al. Accuracy of axillary lymph node staging in breast cancer patients: an observer-performance study comparison of MRI and ultrasound. Acad Radiol 2013;20:1399-404. [Crossref] [PubMed]
Zhu Y, Zhou W, Zhou JQ, et al. Axillary Staging of Early-Stage Invasive Breast Cancer by Ultrasound-Guided Fine-Needle Aspiration Cytology: Which Ultrasound Criteria for Classifying Abnormal Lymph Nodes Should Be Adopted in the Post-ACOSOG Z0011 Trial Era? J Ultrasound Med 2016;35:885-93. [Crossref] [PubMed]
Zhang W, Yue Z, Ye J, et al. Modulation format identification using the Calinski-Harabasz index. Appl Opt 2022;61:851-7. [Crossref] [PubMed]
Nohara Y, Matsumoto K, Soejima H, et al. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed 2022;214:106584. [Crossref] [PubMed]
Giuliano AE, Ballman K, McCall L, et al. Locoregional Recurrence After Sentinel Lymph Node Dissection With or Without Axillary Dissection in Patients With Sentinel Lymph Node Metastases: Long-term Follow-up From the American College of Surgeons Oncology Group (Alliance) ACOSOG Z0011 Randomized Trial. Ann Surg 2016;264:413-20. [Crossref] [PubMed]
Shao H, Sun Y, Na Z, et al. Diagnostic value of applying preoperative breast ultrasound and clinicopathologic features to predict axillary lymph node burden in early invasive breast cancer: a study of 1247 patients. BMC Cancer 2024;24:112. [Crossref] [PubMed]
Wang K, Lu X, Zhou H, et al. Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut 2019;68:729-41. [Crossref] [PubMed]
Zhou WJ, Zhang YD, Kong WT, et al. Preoperative prediction of axillary lymph node metastasis in patients with breast cancer based on radiomics of gray-scale ultrasonography. Gland Surg 2021;10:1989-2001. [Crossref] [PubMed]
Zhang H, Cao W, Liu L, et al. Noninvasive prediction of node-positive breast cancer response to presurgical neoadjuvant chemotherapy therapy based on machine learning of axillary lymph node ultrasound. J Transl Med 2023;21:337. [Crossref] [PubMed]
Ma Q, Wang J, Tu Z, et al. Prediction model of axillary lymph node status using an automated breast volume ultrasound radiomics nomogram in early breast cancer with negative axillary ultrasound. Front Immunol 2025;16:1460673. [Crossref] [PubMed]
Tang YL, Wang B, Ou-Yang T, et al. Ultrasound radiomics based on axillary lymph nodes images for predicting lymph node metastasis in breast cancer. Front Oncol 2023;13:1217309. [Crossref] [PubMed]
Chung HL, Le-Petross HT, Leung JWT. Imaging Updates to Breast Cancer Lymph Node Management. Radiographics 2021;41:1283-99. [Crossref] [PubMed]
Gong X, Li Q, Gu L, et al. Conventional ultrasound and contrast-enhanced ultrasound radiomics in breast cancer and molecular subtype diagnosis. Front Oncol 2023;13:1158736. [Crossref] [PubMed]
Yang Y, Han Y, Zhao S, et al. Spatial heterogeneity of edema region uncovers survival-relevant habitat of Glioblastoma. Eur J Radiol 2022;154:110423. [Crossref] [PubMed]
Gatenby RA, Grove O, Gillies RJ. Quantitative imaging in cancer evolution and ecology. Radiology 2013;269:8-15. [Crossref] [PubMed]
The Lancet Respiratory Medicine. Opening the black box of machine learning. Lancet Respir Med 2018;6:801. [Crossref] [PubMed]

Cite this article as: Zhou M, Tang YL, Chen YY, Chen FL, Zhong Q, Nie ST, Yi A, Wang B. Explainable machine learning based on habitat analysis of axillary lymph node ultrasound images for preoperatively noninvasive evaluation of axillary lymph node metastasis in breast cancer: a bi-center study. Gland Surg 2026;15(6):160. doi: 10.21037/gs-2026-0173

Explainable machine learning based on habitat analysis of axillary lymph node ultrasound images for preoperatively noninvasive evaluation of axillary lymph node metastasis in breast cancer: a bi-center study

Highlight box

Introduction

Methods

Patient

Ultrasound examinations and images acquisition

Segmentation and habitat generation

Feature extraction and selection

Voxel-wise local feature extraction and quantification

Construction and normalization of the global feature matrix

K-means clustering and sub-region definition

Feature engineering in sub-regions

Prediction model development and validation

Prediction model evaluation and SHapley Additive exPlanations (SHAP) interpretability analysis

Statistical analysis

Results

Patients’ characteristics

Feature selection and selection

Model performance

Table 1

Table 2

Model evaluation and interpretation

Calibration curves and DCA

SHAP analysis

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share