A nomogram based on CENPP expression for survival prediction in breast cancer
Introduction
Breast cancer (BC) is the most common malignancy and the leading cause of cancer related death among women worldwide (1). BC is a highly heterogeneous disease and its prognosis varies with different clinical stages, molecular subtypes and histologic types. Even in BC patients with same clinical stage, the histologic type or the molecular subtype, their prognosis is also sufficiently different, indicating the outcome variation cannot to be explained only by clinicopathological parameters. Therefore, it is of importance and far-reaching significance to explore the heterogeneity of gene expression in BC, search for appropriate molecular biomarkers and establish prognosis prediction models with combined information from both clinic and genetic data. At present, prognosis models based on multi-gene panel detection are increasingly utilized in the clinic to complement T, N, M and biomarker information, such as OncotypeDX, EndoPredict, PAM50 (Prosigna breast cancer prognosis markers), breast cancer index method, etc. (2). Unfortunately, due to the high cost, insufficient technology and poor reproducibility (3), the clinical application of these polygenic prediction models has been greatly limited. Therefore, it is necessary to find new and potential biological markers for BC and develop more practical and affordable prognostic assessment tools.
Constitutive centromere-associated network (CCAN) underlies the centromere specificity and stability of the kinetochore in mitosis of human cancer cells (4-8). To date, 17 members of CCAN have been identified in human, including CENPA/C/H/I/K/L/M/N/O/P/Q/R/S/T/U/W/X, and each of them was closely connected and interacted (9). Centromere protein abnormalities are essential for cancer development (10). In recent years, previous studies have shown that CENPA/H/U/I/O were associated with BC (11-16), lung cancer (17), bladder cancer (18), and gastric cancer (19). Regarding to BC, previous studies have reported that CENPK down-regulation in triple-negative breast cancer cells inhibited cell proliferation and invasion ability (11), down-regulation of CENPU and CENPH gene expression resulted in breast cancer cell proliferation inhibition by cell cycle arrest and apoptosis induction (12). The expression level of CENPA was higher in ER− tumors than in ER+ tumors, and it’s an important independent prognostic indicator in ER+ BC patients who have not received systemic therapy (endocrine therapy or chemotherapy) (14). Thangavelu et al. found that CENPI mRNA and protein levels were significantly increased in ER+ tumors, and proved that CENPI overexpression promoted chromosomal instability in ER+ BC patients, leading to poor prognosis (16). The above studies on CENP-A/H/I/K/U indicated a vital role of CENPs played in the diagnosis and potential targeted therapy of BC patients. However, many members of this family have not been studied, which urged us to investigate the prognostic value of CENPs in BC.
Since members of CENPs family played a potential role in the occurrence and development of BC, and clinically, there’s urgent need for a simple, economical and valuable unified model of BC prognosis prediction, the aim of this study was to explore the prognostic role of CENPs in BC, and establish a prognostic prediction model based on CENPs expression and prognostic clinicopathological parameters.
We present the following article in accordance with the TRIPOD reporting checklist (available at https://dx.doi.org/10.21037/gs-21-30).
Methods
Datasets source and patients selection
We downloaded the mRNA expression profile of CENPs in the TCGA Breast Cancer from the Xena system (https://xenabrowser.net/datapages/) for statistical analysis. In this study, we selected 1,215 BC samples with raw counts of RNAseq expression data of CENPs and corresponding clinicopathological features. Clinicopathological indicators, including age, gender, ER status, PR status, Her2 status, histological types, pathological stages, survival time and survival status, were included in this study. Cases with any of the above indicators missing were excluded from this study to ensure that included patients retained complete RNAseq expression data, clinicopathological characteristics, and prognostic information. Finally, 800 BC patients who met the inclusion criteria were screened out of 1,215 BC patients from the TCGA database. Additionally, since the expression of CENPC was not found on Xena platform, it was excluded from this study. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
In the Xena website, the gene expression was standardized and normalized through the FPKM-UQ (The upper quartile Fragments per Kilobase of transcript per Millionmapped reads) quantitative method provided by TCGA database. Then, in this study, the expression levels of CENPs were divided into high and low expression groups according to the median of mRNA expression profile.
Follow up
The clinical outcome endpoint of this study was overall survival (OS), which was defined as the time from the diagnosis of BC to death from any causes. Follow-up referred to the period from the diagnosis of BC to the occurrence of an outcome event. The TCGA database recorded the survival status of the lost follow-up as “blank”, which were excluded accordingly.
GEPIA and bcGenExMiner v4.4
GEPIA is a web-based tool to analyze the mRNA expression data of 8,587 normal and 9,736 tumor samples from the TCGA and the GTEx projects (20). In this study, we verified the mRNA expression of the candidate genes of CENPs family in BC via GEPIA (tumor vs. normal). bcGenExMiner v4.4 is a dataset of published annotated BC transcriptomic. The statistical analyses are divided into three modules: “expression”, “prognosis” and “correlation” (21,22). The expression module could be utilized to compare the expression of candidate genes under different clinical features, such as receptor status (ER+ vs. ER−, PR+ vs. PR−, HER2+ vs. HER2− by IHC), nodal status, SBR, age, molecular subtypes and so on.
Statistical analysis
800 BC patients from the TCGA database were randomly divided into training set and validation set according to 3:2 using the “caret” package of R software (https://CRAN.R-project.org/package=caret). Chi-square test or Fisher’s exact test was used to compare the distribution differences of classification variables between the two groups. Kaplan-Meier method was used to draw the survival curve and log-rank test was used to compare the survival difference between the two groups. Cox regression analysis was used to screen the independent prognostic factors via the “survival” (23) package of R software.
Based on the results of multivariate Cox analysis, nomogram was constructed using the “rms” (24) package of R software version 3.5.2 (https://www.r-project.org/). Then, receiver operating characteristic (ROC) curves and calibration plots were used to validate the performance of nomogram. The area under ROC curve (AUC) was used to evaluate the predicting ability of the model, and it ranged from 0 to 1.0, with 0 indicating discordance, 0.5 representing a random probability, while 1 indicating a perfect discrimination. The calibration plot was used to evaluate the accuracy of the nomogram. In a perfect calibration model, calibration plots (with 1,000 bootstrap resamples and 5-fold cross-validation) would fall on a 45-degree diagonal line.
All P values were two-sided and the level of significance was set at P<0.05.
Results
Clinicopathologic characteristics of the patients
The TCGA Breast Cancer (BRCA) dataset from Xena platform cataloged 1,215 BC patients. After excluding 415 BC patients with incomplete relevant information, 800 BC patients who met the eligible criteria were included, which were divided into a training set (N=480) and a validation set (N=320) randomly at the rate of 3:2. The clinicopathological characteristics of the training set and validation set were comparable (Table 1).
Full table
Independent prognostic factors of BC
Age, Her2 status, pathologic_T stage, pathologic_N stage, pathologic_M stage and CENPP expression were identified as predictive factors for OS of BC in the univariate analysis (Table 2), while except pathologic_N stage, all the other variables were further confirmed as independent predictive factors in the multivariate analysis (Table 3). The results showed that BC patients of 65–75 years old (P=0.015, HR =2.67; 95% CI: 1.21–5.86) and >75 years old (P<0.000, HR =3.63; 95% CI: 1.87–7.03) had worse OS than those <65 years old. In addition, Her2 positive patients (P=0.027, HR =2.04; 95% CI: 1.09–3.82) had worse OS than Her2 negative patients. BC patients with pathologic_T4 stage (P=0.003, HR =5.401; 95% CI: 1.78–16.38) and pathologic_M1 stage (P=0.040, HR =4.45; 95% CI: 0.07–18.47) had worse OS compared with pathologic_T1 stage and pathologic_M0 stage, respectively. Additionally, BC patients with expression of CENPP lower than the median (P=0.005, HR =2.35; 95% CI: 1.30–4.23) were significantly correlated with worse OS.
Full table
Full table
The expression pattern of CENPP in BC
Since CENPP expression was identified as the only independent prognostic factor in CENPs for BC patients, we analyzed its expression pattern through GEPIA and found that GENPP was overexpressed in BC tissues compared with normal tissues (P<0.01) (Figure 1A). Data from bcGenExMiner v4.4 showed that the CENPP mRNA level in Luminal A subtype ranked the highest among all 5 subtypes classified by PAM50 (P<0.0001) (Figure 1B). Further analysis showed that the expression of CENPP was higher in ER+ or PR+ tumors (P<0.0001; P<0.0001), whereas lower in Her2+ tumors (P<0.0001) (Figure 1C,D,E). In order to investigate the correlation between CENPP expression and clinicopathological features, patients in the training set was divided into CENPP high and CENPP low groups by the median expression of CENPP. CENPP high expression group had higher ER+ and PR+ (P<0.001; P<0.001) ratio, and had a lower death rate compared to CENPP low expression group (P=0.003) (Table 4).
Full table
Prognostic values of CENPP in BC
Kaplan-Meier plotter showed that higher CENPP expression was associated with better OS in BC patients (P=0.0019) (Figure 2A). Regarding to histological types, we concluded that higher expression of CENPP was associated with better OS in IDC or ILC (P=0.0031; P=0.046) (Figure 2B,C). In addition, CENPP high expression indicated better OS in BC with ER+ or PR+ (P=0.0059; P=0.011), whereas no significant correlation with prognosis in ER− or PR− tumors (P=0.31; P=0.14) (Figure 2D,E,F,G). Moreover, regardless of Her2 status, higher expression of CENPP was associated with better OS (Her2+, P=0.017; Her2-, P=0.039) (Figure 2H,I).
Construction and validation of the nomogram for OS
Based on the results of the multivariate analysis, a nomogram was established with independent prognostic predictors for BC including age, Her2, pathologic_T stage, pathologic_M stage and CENPP expression (Figure 3). Different variables of each patient pointed to a score according to the top scale, and then all scores were summed up to get a total score. Based on the total score of the bottom scale, 3- and 5-year survival probabilities of BC could be evaluated. Next, the ROC curve was performed to evaluate the effectiveness of the nomogram (Figure 4). In the training set, the AUCs of 3- and 5-year survival prediction were 0.757 and 0.797 (Figure 4A,B), respectively. In the validation set, the AUC of 3- and 5-year survival prediction were 0.727 and 0.71 (Figure 4C,D), respectively. The calibration plot (Figure 5A,B,C,D) suggested that the nomogram was well calibrated. These results suggested that this nomogram displayed good accuracy in predicting both 3- and 5-year overall survival for patients with BC.
In order to explicit the necessity of including CENPP expression in the nomogram, we then established a new prognostic model with only four traditional clinicopathological features (age, Her2 status, pathological T stage and pathological M stage) included and CENPP expression excluded (Figure S1). The verification results showed that the AUC values of 3- and 5-year OS prediction in the training set were 0.676 and 0.677, respectively (Figure S2A,B). In the validation set, the AUC values for 3- and 5-year OS prediction were 0.646 and 0.615, respectively (Figure S2C,D). These results suggested that the model with CENPP expression has better performance than the model only with conventional clinicopathological features in predicting 3- and 5-year OS of BC patients.
Discussion
In this study, data of CENPs expression and clinicopathological features of BC patients were downloaded and analyzed from the TCGA database, which aimed to discover more biological genes and molecular indexes to accurately predict the prognosis of BC. Based on the Cox regression analysis, we identified that BC patients with older age, Her2 positivity, advanced T stage, advanced M stage, or lower expression of CENPP were accompanied by worse OS. Next, high expression of CENPP was proved to be associated with better OS in ER + BC or PR + BC, whereas regardless of Her2 status, higher expression of CENPP indicated better OS. Finally, we constructed a nomogram on the basis of CENPP expression as well as other independent predictors. The 3- and 5-year of AUCs in the training set were 0.757 and 0.797, and that of AUCs in the validation set were 0.727 and 0.71.
Cell mitosis is the process of transferring genetic information from the parent cell to the daughter cell. In the process of mitosis, CENPs not only provided energy for the separation of sister chromatids, but also served as a genomic information monitoring function. Once this process loses normal regulation or makes mistakes, it may induce the occurrence of malignant tumors (25,26). CENPs play a pivotal role in maintaining normal mitosis in cells. In this study, CENPP was identified as the only prognosis-related gene in CENPs for patients with BC. Some studies reported that CENPP was associated with mixed uterine carcinosarcoma and among its related pathways were mitotic metaphase, anaphase and signaling by G protein coupled receptor (GPCR) (4,27). However, its role in BC is unknown. To our knowledge, this is the first study to investigate the prognostic value of CENPP in BC.
Hormone receptor status plays a key role in the formation and development of BC. ER and PR status as an important biological indicator of choosing treatment schemes have been widely recognized and accepted in BC patients. Compared with ER− BC, the tumor differentiation of ER+ BC is better, the invasiveness is less, and the long-term survival rate is higher. According to the data of the American Registry of Cancer Research, 20% of patients with ER+ BC were PR− (28). Studies have shown that ER+PR− was a more invasive subtype of ER+ BC (29,30). The overall survival and disease-free survival of ER+PR− BC was lower than that of ER+PR+ BC. Purdie et al. believed that PR was an independent predictor of early breast cancer prognosis (31). Our study found that the mRNA level of CENPP was positively correlated with ER status and PR status, and higher mRNA expression of CENPP indicated better OS in BC with ER+ or PR+, which suggests a close relationship between CENPP and hormone receptor pathway and the underling mechanism requires further investigation. In addition, recent studies have shown that the overexpression of Her2 did not only indicate invasiveness and poor prognosis in BC, but also predict the sensitivity of BC to systemic treatment. The result of our study showed that no matter the Her2 status was positive or negative, higher expression of CENPP was associated with better OS which indicated an inconsequential association between CENPP and Her2 pathway. Collectively, these findings suggested that CENPP was an effective prognostic predictor of BC and might also be a potential target for HR+ BC.
Currently, the nomogram has been developed and shown to be more accurate in predicting prognosis in some cancers than the conventional staging systems (32-34). This study attempted to establish a prognostic nomogram of BC and to determine whether the model can accurately predict survival of patients with BC. Age, Her2 status, pathologic_T stage, pathologic_M stage and CENPP expression were identified as predictive factors for OS of BC in the multivariate Cox analysis. This is the first study to set up a nomogram based on the CENPP expression and conventional prognosis predictors to predict OS in patients with BC. Through validation, the nomogram showed good performance in predicting survival, and its accuracy was supported by the ROC curves and the calibration curves. When a patient is diagnosed with BC and has obtained the above clinicopathological results, we can predict the clinical prognosis according to her own features and CENPP expression level. If the predicted prognosis is poor, intensive treatment might be recommended in the hope of gaining a better outcome.
There’re some limitations of this study. Firstly, the demographic and clinical information provided by the TCGA database were not complete. For example, the database lacked detailed records like surgery, marital status and insurance status information. Different surgical approaches, marital status and insurance status may influence the outcome of BC patients. Secondly, this was a retrospective study, all the data of this study were obtained from publicly available databases. More prospective studies are needed to further confirm our conclusions.
Conclusions
In this study, we identified that higher expression of CENPP was associated with better prognosis, and established a prognostic nomogram with good performance based on CENPP expression and clinicopathological features. Our study provided a novel method for clinical evaluation and a potential biomarker/target for BC which needs to be further validated in a prospective study and investigated in further basic research.
Acknowledgments
We would like to thank Ivan Chen for his help in polishing our paper.
Funding: This work was supported by Natural Science Foundation of Shaanxi Province, China (No. 2018JQ8004). We thank The Cancer Genome Atlas (TCGA) Database for sharing the large amount of data.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://dx.doi.org/10.21037/gs-21-30
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://dx.doi.org/10.21037/gs-21-30). All authors report that this work was supported by Natural Science Foundation of Shaanxi Province, China (No. 2018JQ8004).
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018;68:394-424. [Crossref] [PubMed]
- Harris LN, Ismaila N, McShane LM, et al. Use of biomakers to guide decisions on adjuvant systematic therapy for women with early-stage invasive breast cancer: American Society of Clinical Oncology clinical practice guideline. J Clin Oncol 2016;34:1134-50. [Crossref] [PubMed]
- Kwa M, Makris A, Esteva FJ. Clinical utility of gene-expression signatures in early stage breast cancer. Nat Rev Clin Oncol 2017;14:595-610. [Crossref] [PubMed]
- Eskat A, Deng W, Hofmeister A, et al. Step-wise assembly, maturation and dynamic behavior of the human CENP-P/O/R/Q/U kinetochore sub-complex. PLoS One 2012;7:e44717 [Crossref] [PubMed]
- Hoischen C, Yavas S, Wohland T, et al. CENP-C/H/I/K/M/T/W/N/L and hMis12 but not CENP-S/X participate in complex formation in the nucleoplasm of living human interphase cells outside centromeres. PLoS One 2018;13:e0192572 [Crossref] [PubMed]
- Lera RF, Norman RX, Dumont M, et al. Plk1 protects kinetochore-centromere architecture against microtubule pulling forces. EMBO Rep 2019;20:e48711 [Crossref] [PubMed]
- Logsdon GA, Gambogi CW, Liskovykh MA, et al. Human Artificial Chromosomes that Bypass Centromeric DNA. Cell 2019;178:624-39.e19. [Crossref] [PubMed]
- Nechemia-Arbely Y, Miga KH, Shoshani O, et al. DNA replication acts as an error correction mechanism to maintain centromere identity by restricting CENP-A to centromeres. Nat Cell Biol 2019;21:743-54. [Crossref] [PubMed]
- McKinley KL, Sekulic N, Guo LY, et al. The CENP-L-N Complex Forms a Critical Node in an Integrated Meshwork of Interactions at the Centromere-Kinetochore Interface. Mol Cell 2015;60:886-98. [Crossref] [PubMed]
- Beh TT, Kalitsis P. The Role of Centromere Defects in Cancer. Prog Mol Subcell Biol 2017;56:541-54. [Crossref] [PubMed]
- Komatsu M, Yoshimaru T, Matsuo T, et al. Molecular features of triple negative breast cancer cells by genome-wide gene expression profiling analysis. Int J Oncol 2013;42:478-506. [Crossref] [PubMed]
- Liao WT, Feng Y, Li ML, et al. Overexpression of centromere protein H is significantly associated with breast cancer progression and overall patient survival. Chin J Cancer 2011;30:627-37. [Crossref] [PubMed]
- Lin SY, Lv YB, Mao GX, et al. The effect of centromere protein U silencing by lentiviral mediated RNA interference on the proliferation and apoptosis of breast cancer. Oncol Lett 2018;16:6721-8. [Crossref] [PubMed]
- McGovern SL, Qi Y, Pusztai L, et al. Centromere protein-A, an essential centromere protein, is a prognostic marker for relapse in estrogen receptor-positive breast cancer. Breast Cancer Res 2012;14:R72. [Crossref] [PubMed]
- Rajput AB, Hu N, Varma S, et al. Immunohistochemical Assessment of Expression of Centromere Protein-A (CENPA) in Human Invasive Breast Cancer. Cancers (Basel) 2011;3:4212-27. [Crossref] [PubMed]
- Thangavelu PU, Lin CY, Vaidyanathan S, et al. Overexpression of the E2F target gene CENPI promotes chromosome instability and predicts poor prognosis in estrogen receptor-positive breast cancer. Oncotarget 2017;8:62167-82. [Crossref] [PubMed]
- Wang X, Chen D, Gao J, et al. Centromere protein U expression promotes non-small-cell lung cancer cell proliferation through FOXM1 and predicts poor survival. Cancer Manag Res 2018;10:6971-84. [Crossref] [PubMed]
- Wang S, Liu B, Zhang J, et al. Centromere protein U is a potential target for gene therapy of human bladder cancer. Oncol Rep 2017;38:735-44. [Crossref] [PubMed]
- Cao Y, Xiong J, Li Z, et al. CENPO expression regulates gastric cancer cell proliferation and is associated with poor patient prognosis. Mol Med Rep 2019;20:3661-70. [Crossref] [PubMed]
- Tang Z, Li C, Kang B, et al. GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses. Nucleic Acids Res 2017;45:W98-W102. [Crossref] [PubMed]
- Jezequel P, Campone M, Gouraud W, et al. bc-GenExMiner: an easy-to-use online platform for gene prognostic analyses in breast cancer. Breast Cancer Res Treat 2012;131:765-75. [Crossref] [PubMed]
- Jezequel P, Frenel JS, Campion L, et al. bc-GenExMiner 3.0: new mining module computes breast cancer gene expression correlation analyses. Database (Oxford) 2013;2013:bas060 [Crossref] [PubMed]
- Li JCA. Modeling survival data: Extending the Cox model. Sociol Method Res 2003;32:117-20. [Crossref]
- Nunez E, Steyerberg EW, Nunez J. Regression modeling strategies. Rev Esp Cardiol 2011;64:501-7. [PubMed]
- Beaver JA, Gustin JP, Yi KH, et al. PIK3CA and AKT1 mutations have distinct effects on sensitivity to targeted pathway inhibitors in an isogenic luminal breast cancer model system. Clin Cancer Res 2013;19:5413-22. [Crossref] [PubMed]
- Lee H, Kim SJ, Jung KH, et al. A novel imidazopyridine PI3K inhibitor with anticancer activity in non-small cell lung cancer cells. Oncol Rep 2013;30:863-9. [Crossref] [PubMed]
- Foltz DR, Jansen LE, Black BE, et al. The human CENP-A centromeric nucleosome-associated complex. Nat Cell Biol 2006;8:458-69. [Crossref] [PubMed]
- Okada M, Cheeseman IM, Hori T, et al. The CENP-H-I complex is required for the efficient incorporation of newly synthesized CENP-A into centromeres. Nat Cell Biol 2006;8:446-57. [Crossref] [PubMed]
- Cancello G, Maisonneuve P, Rotmensz N, et al. Progesterone receptor loss identifies Luminal B breast cancer subgroups at higher risk of relapse. Ann Oncol 2013;24:661-8. [Crossref] [PubMed]
- Dunnwald LK, Rossing MA, Li CI. Hormone receptor status, tumor characteristics, and prognosis: a prospective cohort of breast cancer patients. Breast Cancer Res 2007;9:R6. [Crossref] [PubMed]
- Purdie CA, Quinlan P, Jordan LB, et al. Progesterone receptor expression is an independent prognostic variable in early breast cancer: a population-based study. Br J Cancer 2014;110:565-72. [Crossref] [PubMed]
- International Bladder Cancer Nomogram Consortium. Postoperative nomogram predicting risk of recurrence after radical cystectomy for bladder cancer. J Clin Oncol 2006;24:3967-72. [Crossref] [PubMed]
- Karakiewicz PI, Briganti A, Chun FK, et al. Multi-institutional validation of a new renal cancer-specific survival nomogram. J Clin Oncol 2007;25:1316-22. [Crossref] [PubMed]
- Touijer K, Scardino PT. Nomograms for staging, prognosis, and predicting treatment outcomes. Cancer 2009;115:3107-11. [Crossref] [PubMed]