Abstract
Pseudogenes have long been thought to be nonfunctional due to their lack of protein-coding ability, compared to protein-coding genes (PCG). Actually, pseudogenes can transcribe functional long non-coding RNAs (lncRNAs) involved in cancer development and progression. These lncRNAs may regulate mRNA levels as competing endogenous RNAs (ceRNAs). However, a systematic pan-cancer analysis of pseudogene ceRNA networks is still lacking. Here, we curated 9455 pseudogenes and constructed ceRNA networks for 14 cancer types. We discovered that pseudogenes in ceRNA networks were the most cancer type-specific and that ~ 20% cePCGs were cancer hallmark genes, supporting the close relationship between pseudogenes and cancer. Notably, in breast cancer (BRCA), we found that the ceRNA subnetwork of ZNF252P/AL390726.5 (pseudogenes)-miRNAs-ESR1 may enhance the proto-oncogene MYC and we experimentally validated oncogenic effects of the two pseudogenes. Moreover, pseudogene-based subtyping of BRCA tumors revealed a new subtype featured by immunoglobulin pseudogenes. Collectively, our findings could pave the way for more pseudogene research in cancer. We provided our results in a user-friendly database, cePseudo, available at http://bioinfo-sysu.com/cePseudo2.
Similar content being viewed by others
Data availability
GTEx RNA-seq sequencing reads files were downloaded from dbGaP (phs000424.v8.p2). TCGA RNA-seq BAM files were downloaded from GDC (https://portal.gdc.cancer.gov). Expression levels of PCGs (HTSeq-FPKM and HTSeq-Counts) and miRNAs (mirbase21.isoforms.quantification.txt) and clinical information were downloaded from the UCSC Xena database (https://xenabrowser.net). CPTAC proteomics data were available at https://proteomics.cancer.gov/programs/cptac.
References
Agarwal, V., Bell, G. W., Nam, J. W., & Bartel, D. P. (2015). Predicting effective microRNA target sites in mammalian mRNAs. eLife, 4, e05005.
Aguda, B. D., Kim, Y., Piper-Hunter, M. G., Friedman, A., & Marsh, C. B. (2008). MicroRNA regulation of a cancer network: consequences of the feedback loops involving miR-17-92, E2F, and Myc. Proceedings of the National Academy of Sciences, 105, 19678–19683.
Balakirev, E. S., & Ayala, F. J. (2003). Pseudogenes: Are they “junk” or functional DNA? Annual Review of Genetics, 37, 123–151.
Barthel, F. P., Wei, W., Tang, M., Martinez-Ledesma, E., Hu, X., Amin, S. B., Akdemir, K. C., Seth, S., Song, X., Wang, Q., et al. (2017). Systematic analysis of telomere length and somatic alterations in 31 cancer types. Nature Genetics, 49, 349–357.
Betel, D., Koppal, A., Agius, P., Sander, C., & Leslie, C. (2010). Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome Biology, 11, R90.
Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30, 2114–2120.
Brosch, M., Saunders, G. I., Frankish, A., Collins, M. O., Yu, L., Wright, J., Verstraten, R., Adams, D. J., Harrow, J., Choudhary, J. S., et al. (2011). Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and “resurrected” pseudogenes in the mouse genome. Genome Research, 21, 756–767.
Cao, Z., Pan, X., Yang, Y., Huang, Y., & Shen, H. B. (2018). The lnclocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics, 34, 2185–2194.
Cheetham, S. W., Faulkner, G. J., & Dinger, M. E. (2020). Overcoming challenges and dogmas to understand the functions of pseudogenes. Nature Reviews Genetics, 21, 191–201.
Chin, C. H., Chen, S. H., Wu, H. H., Ho, C. W., Ko, M. T., & Lin, C. Y. (2014). cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Systems Biology, 8(Suppl 4), S11.
Deming, S. L., Nass, S. J., Dickson, R. B., & Trock, B. J. (2000). C-myc amplification in breast cancer: a meta-analysis of its occurrence and prognostic relevance. British Journal of Cancer, 83, 1688–1695.
Dhawan, A., Scott, J. G., Harris, A. L., & Buffa, F. M. (2018). Pan-cancer characterisation of microRNA across cancer hallmarks reveals microRNA-mediated downregulation of tumour suppressors. Nature Communications, 9, 5228.
Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., & Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29, 15–21.
Eng, J. K., Jahan, T. A., & Hoopmann, M. R. (2013). Comet: An open-source MS/MS sequence database search tool. Proteomics, 13, 22–24.
Fan, M., Sethuraman, A., Brown, M., Sun, W., & Pfeffer, L. M. (2014). Systematic analysis of metastasis-associated genes identifies miR-17-5p as a metastatic suppressor of basal-like breast cancer. Breast Cancer Research and Treatment, 146, 487–502.
Gao, L., Ren, W., Zhang, L., Li, S., Kong, X., Zhang, H., Dong, J., Cai, G., Jin, C., Zheng, D., et al. (2017). PTENp1, a natural sponge of miR-21, mediates PTEN expression to inhibit the proliferation of oral squamous cell carcinoma. Molecular Carcinogenesis, 56, 1322–1334.
Gaujoux, R., & Seoighe, C. (2010). A flexible R package for nonnegative matrix factorization. BMC Bioinformatics, 11, 367.
Goldman, M. J., Craft, B., Hastie, M., Repecka, K., McDade, F., Kamath, A., Banerjee, A., Luo, Y., Rogers, D., Brooks, A. N., et al. (2020). Visualizing and interpreting cancer genomics data via the Xena platform. Nature Biotechnology, 38, 675–678.
Han, L., Yuan, Y., Zheng, S., Yang, Y., Li, J., Edgerton, M. E., Diao, L., Xu, Y., Verhaak, R. G. W., & Liang, H. (2014). The pan-cancer analysis of pseudogene expression reveals biologically and clinically relevant tumour subtypes. Nature Communications, 5, 3963.
Hanahan, D. (2022). Hallmarks of Cancer: New Dimensions. Cancer Discovery, 12, 31–46.
Hayashi, H., Arao, T., Togashi, Y., Kato, H., Fujita, Y., De Velasco, M. A., Kimura, H., Matsumoto, K., Tanaka, K., Okamoto, I., et al. (2015). The OCT4 pseudogene POU5F1B is amplified and promotes an aggressive phenotype in gastric cancer. Oncogene, 34, 199–208.
Ji, Z., Song, R., Regev, A., & Struhl, K. (2015). Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins. eLife, 4, e08890.
Kalyana-Sundaram, S., Kumar-Sinha, C., Shankar, S., Robinson, D. R., Wu, Y. M., Cao, X., Asangani, I. A., Kothari, V., Prensner, J. R., Lonigro, R. J., et al. (2012). Expressed pseudogenes in the transcriptional landscape of human cancers. Cell, 149, 1622–1634.
Karreth, F. A., Reschke, M., Ruocco, A., Ng, C., Chapuy, B., Leopold, V., Sjoberg, M., Keane, T. M., Verma, A., Ala, U., et al. (2015). The BRAF pseudogene functions as a competitive endogenous RNA and induces lymphoma in vivo. Cell, 161, 319–332.
Kryuchkova-Mostacci, N., & Robinson-Rechavi, M. (2017). A benchmark of gene expression tissue-specificity metrics. Briefings in Bioinformatics, 18, 205–214.
Li, J. H., Liu, S., Zhou, H., Qu, L. H., & Yang, J. H. (2014). starBase v2.0: Decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Research, 42, D92-97.
Liao, Y., Smyth, G. K., & Shi, W. (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 30, 923–930.
Liu, B., Zhou, X., Wu, D., Zhang, X., Shen, X., Mi, K., Qu, Z., Jiang, Y., & Shang, D. (2021). Comprehensive characterization of a drug-resistance-related ceRNA network across 15 anti-cancer drug categories. Mol Ther Nucleic Acids, 24, 11–24.
Nielsen, M., Lundegaard, C., Blicher, T., Lamberth, K., Harndahl, M., Justesen, S., Roder, G., Peters, B., Sette, A., Lund, O., et al. (2007). NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE, 2, e796.
Oh, H., Wang, S. C., Prahash, A., Sano, M., Moravec, C. S., Taffet, G. E., Michael, L. H., Youker, K. A., Entman, M. L., & Schneider, M. D. (2003). Telomere attrition and Chk2 activation in human heart failure. Proceedings of the National Academy of Sciences, 100, 5378–5383.
Pinero, J., Bravo, A., Queralt-Rosinach, N., Gutierrez-Sacristan, A., Deu-Pons, J., Centeno, E., Garcia-Garcia, J., Sanz, F., & Furlong, L. I. (2017). DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Research, 45, D833–D839.
Pink, R. C., Wicks, K., Caley, D. P., Punch, E. K., Jacobs, L., & Carter, D. R. (2011). Pseudogenes: pseudo-functional or key regulators in health and disease? RNA, 17, 792–798.
Plaisier, C. L., Pan, M., & Baliga, N. S. (2012). A miRNA-regulatory network explains how dysregulated miRNAs perturb oncogenic processes across diverse cancers. Genome Research, 22, 2302–2314.
Pockrandt, C., Alzamel, M., Iliopoulos, C. S., & Reinert, K. (2020). GenMap: ultra-fast computation of genome mappability. Bioinformatics, 36, 3687–3692.
Poliseno, L., Salmena, L., Zhang, J., Carver, B., Haveman, W. J., & Pandolfi, P. P. (2010). A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature, 465, 1033–1038.
Poursani, E. M., Mohammad Soltani, B., & Mowla, S. J. (2016). Differential expression of OCT4 pseudogenes in pluripotent and tumor cell lines. Cell Journal, 18, 28–36.
Quinlan, A. R., & Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26, 841–842.
Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43, e47.
Robinson, D. R., Wu, Y. M., Vats, P., Su, F., Lonigro, R. J., Cao, X., Kalyana-Sundaram, S., Wang, R., Ning, Y., Hodges, L., et al. (2013). Activating ESR1 mutations in hormone-resistant metastatic breast cancer. Nature Genetics, 45, 1446–1451.
Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139–140.
Salmena, L., Poliseno, L., Tay, Y., Kats, L., & Pandolfi, P. P. (2011). A ceRNA hypothesis: the Rosetta stone of a hidden RNA language? Cell, 146, 353–358.
Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., Amin, N., Schwikowski, B., & Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research, 13, 2498–2504.
Sugimoto, K. (2018). Branching the Tel2 pathway for exact fit on phosphatidylinositol 3-kinase-related kinases. Current Genetics, 64, 965–970.
Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., & Bray, F. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 71, 209–249.
Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., Simonovic, M., Doncheva, N. T., Morris, J. H., Bork, P., et al. (2019). STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47, D607–D613.
The, M., MacCoss, M. J., Noble, W. S., & Kall, L. (2016). Fast and Accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0. Journal of the American Society for Mass Spectrometry, 27, 1719–1727.
Tian, X., Song, J., Zhang, X., Yan, M., Wang, S., Wang, Y., Xu, L., Zhao, L., Wei, J. J., Shao, C., et al. (2020). MYC-regulated pseudogene HMGA1P6 promotes ovarian cancer malignancy via augmenting the oncogenic HMGA1/2. Cell Death & Disease, 11, 167.
Wang, C., Mayer, J. A., Mazumdar, A., Fertuck, K., Kim, H., Brown, M., & Brown, P. H. (2011). Estrogen induces c-myc gene expression via an upstream enhancer activated by the estrogen receptor and the AP-1 transcription factor. Molecular Endocrinology, 25, 1527–1538.
Zhang, Y., Xu, Y., Feng, L., Li, F., Sun, Z., Wu, T., Shi, X., Li, J., & Li, X. (2016). Comprehensive characterization of lncRNA-mRNA related ceRNA network across 12 major cancers. Oncotarget, 7, 64148–64167.
Zheng, L. L., Zhou, K. R., Liu, S., Zhang, D. Y., Wang, Z. L., Chen, Z. R., Yang, J. H., & Qu, L. H. (2018). dreamBase: DNA modification, RNA regulation and protein binding of expressed pseudogenes in human health and disease. Nucleic Acids Research, 46, D85–D91.
Zhou, Y., Zhou, B., Pache, L., Chang, M., Khodabakhshi, A. H., Tanaseichuk, O., Benner, C., & Chanda, S. K. (2019). Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications, 10, 1523.
Acknowledgements
The research has been supported by the Integrated Project of Major Research Plan of National Natural Science Foundation of China (NSFC) (92249303) and Guangdong Basic and Applied Basic Research Foundation (2021A1515110972). The results shown here are in whole or in part based upon data generated by the TCGA or CPTAC Research Network.
Funding
Integrated Project of Major Research Plan of National Natural Science Foundation of China, 92249303, Yuanyan Xiong; Basic and Applied Basic Research Foundation of Guangdong Province, 2021A1515110972, Mengbiao Guo.
Author information
Authors and Affiliations
Contributions
YYX and MBG conceived the project. QLL, MBG, and JKZ analyzed the data and interpreted the results. JXZ, QW, MHX, and ZWF performed experimental validation. MBG wrote the manuscript. MBG and JKZ revised the manuscript. YYX, MBG, QW, and ZSY reviewed the manuscript. All authors approved the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, M., Zhang, J., Liang, Q. et al. Pan-cancer pseudogene RNA analysis reveals a regulatory network promoting cancer cell proliferation. GENOME INSTAB. DIS. 4, 85–97 (2023). https://doi.org/10.1007/s42764-023-00097-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42764-023-00097-2