Abstract
Pseudogenes have long been considered nonfunctional elements. The influx of large-scale sequencing projects over the last decade have provided rich sources of evidence that pseudogenes can play key evolutionary and regulatory roles, highlighting the need for high quality annotation for both human and key model organisms. To date, GENCODE has completed the manual annotation of pseudogenes in human and has undertaken the task to curate and characterize pseudogenes in the mouse reference genome. Capitalizing on available high-quality annotations as well as on the functional-genomics, evolutionary, and phenotypical data, we were able to create a comprehensive picture of both the human and mouse pseudogene complements’ creation, development, and activity. Thus, we found that while human pseudogenes were created through a single burst of retrotransposition events, the active transposable element content in mouse allows for a continuous renewal of the pseudogene pool. Despite their differences, the two organisms share a number of similarities in terms of pseudogene activity, with ~10% of pseudogenes being transcribed. Finally, we highlight a variety of resources developed based on the available GENCODE annotations that help shed light on pseudogene biology.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jacq C, Miller JR, Brownlee GG (1977) A pseudogene structure in 5S DNA of Xenopus laevis. Cell 12(1):109–120. https://doi.org/10.1016/0092-8674(77)90189-1
Bobay L-M, Ochman H (2017) The evolution of bacterial genome architecture. Front Genet 8:72. https://doi.org/10.3389/fgene.2017.00072
Danneels B, Pinto-Carbó M, Carlier A (2018) Patterns of nucleotide deletion and insertion inferred from bacterial pseudogenes. Genome Biol Evol 10(7):1792–1802. https://doi.org/10.1093/gbe/evy140
Feng Y, Chien K-Y, Chen H-L, Chiu C-H (2012) Pseudogene recoding revealed from proteomic analysis of salmonella serovars. J Proteome Res 11(3):1715–1719. https://doi.org/10.1021/pr200904c
Sisu C, Pei B, Leng J, Frankish A, Zhang Y, Balasubramanian S, Harte R, Wang D, Rutenberg-Schoenberg M, Clark W, Diekhans M, Rozowsky J, Hubbard T, Harrow J, Gerstein MB (2014) Comparative analysis of pseudogenes across three phyla. Proc Natl Acad Sci U S A 111(37):13361–13366. https://doi.org/10.1073/pnas.1407293111
Liu Y-J, Zheng D, Balasubramanian S, Carriero N, Khurana E, Robilotto R, Gerstein MB (2009) Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity. BMC Genomics 10:480. https://doi.org/10.1186/1471-2164-10-480
Woehle C, Kusdian G, Radine C, Graur D, Landan G, Gould SB (2014) The parasite Trichomonas vaginalis expresses thousands of pseudogenes and long non-coding RNAs independently from functional neighbouring genes. BMC Genomics 15:906. https://doi.org/10.1186/1471-2164-15-906
Lafontaine I, Dujon B (2010) Origin and fate of pseudogenes in Hemiascomycetes: a comparative analysis. BMC Genomics 11:260. https://doi.org/10.1186/1471-2164-11-260
Xiao J, Sekhwal MK, Li P, Ragupathy R, Cloutier S, Wang X, You FM (2016) Pseudogenes and their genome-wide prediction in plants. Int J Mol Sci 17(12). https://doi.org/10.3390/ijms17121991
Mighell AJ, Smith NR, Robinson PA, Markham AF (2000) Vertebrate pseudogenes. FEBS Lett 468(2–3):109–114
Harrison PM, Echols N, Gerstein MB (2001) Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. Nucleic Acids Res 29(3):818–830
Echols N, Harrison P, Balasubramanian S, Luscombe NM, Bertone P, Zhang Z, Gerstein M (2002) Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes. Nucleic Acids Res 30(11):2515–2523
Balakirev ES, Ayala FJ (2003) Pseudogenes: are they “junk” or functional DNA? Annu Rev Genet 37:123–151. https://doi.org/10.1146/annurev.genet.37.040103.103949
Zhang ZD, Frankish A, Hunt T, Harrow J, Gerstein M (2010) Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates. Genome Biol 11(3):R26. https://doi.org/10.1186/gb-2010-11-3-r26
Harrison PM, Zheng D, Zhang Z, Carriero N, Gerstein M (2005) Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability. Nucleic Acids Res 33(8):2374–2383. https://doi.org/10.1093/nar/gki531
Pink RC, Wicks K, Caley DP, Punch EK, Jacobs L, Carter DRF (2011) Pseudogenes: pseudo-functional or key regulators in health and disease? RNA 17(5):792–798. https://doi.org/10.1261/rna.2658311
Tam OH, Aravin AA, Stein P, Girard A, Murchison EP, Cheloufi S, Hodges E, Anger M, Sachidanandam R, Schultz RM, Hannon GJ (2008) Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453(7194):534–538. https://doi.org/10.1038/nature06904
Poliseno L, Salmena L, Zhang J, Carver B, Haveman WJ, Pandolfi PP (2010) A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465(7301):1033–1038. https://doi.org/10.1038/nature09144
An Y, Furber KL, Ji S (2017) Pseudogenes regulate parental gene expression via ceRNA network. J Cell Mol Med 21(1):185–192. https://doi.org/10.1111/jcmm.12952
Zhou BS, Beidler DR, Cheng YC (1992) Identification of antisense RNA transcripts from a human DNA topoisomerase I pseudogene. Cancer Res 52(15):4280–4285
Liu F, Xing L, Zhang X, Zhang X (2019) A four-pseudogene classifier identified by machine learning serves as a novel prognostic marker for survival of osteosarcoma. Genes (Basel) 10(6). https://doi.org/10.3390/genes10060414
Han L, Yuan Y, Zheng S, Yang Y, Li J, Edgerton ME, Diao L, Xu Y, Verhaak RGW, Liang H (2014) The Pan-Cancer analysis of pseudogene expression reveals biologically and clinically relevant tumour subtypes. Nat Commun 5:3963. https://doi.org/10.1038/ncomms4963
Li L, Feng R, Fei S, Cao J, Zhu Q, Ji G, Zhou J (2019) NANOGP8 expression regulates gastric cancer cell progression by transactivating DBC1 in gastric cancer MKN-45 cells. Oncol Lett 17(1):555–563. https://doi.org/10.3892/ol.2018.9595
Albalat R, Cañestro C (2016) Evolution by gene loss. Nat Rev Genet 17(7):379–391. https://doi.org/10.1038/nrg.2016.39
Wang X, Mitra N, Secundino I, Banda K, Cruz P, Padler-Karavani V, Verhagen A, Reid C, Lari M, Rizzi E, Balsamo C, Corti G, De Bellis G, Longo L, Beggs W, Caramelli D, Tishkoff SA, Hayakawa T, Green ED, Mullikin JC, Nizet V, Bui J, Varki A, Program NCS (2012) Specific inactivation of two immunomodulatory SIGLEC genes during human evolution. Proc Natl Acad Sci U S A 109(25):9935–9940. https://doi.org/10.1073/pnas.1119459109
Wang X, Grus WE, Zhang J (2006) Gene losses during human origins. PLoS Biol 4(3):e52. https://doi.org/10.1371/journal.pbio.0040052
Sharma V, Hecker N, Roscito JG, Foerster L, Langer BE, Hiller M (2018) A genomics approach reveals insights into the importance of gene losses for mammalian adaptations. Nat Commun 9(1):1215. https://doi.org/10.1038/s41467-018-03667-1
Cameron J, Holla ØL, Berge KE, Kulseth MA, Ranheim T, Leren TP, Laerdahl JK (2008) Investigations on the evolutionary conservation of PCSK9 reveal a functionally important protrusion. FEBS J 275(16):4121–4133. https://doi.org/10.1111/j.1742-4658.2008.06553.x
Maxwell KN, Fisher EA, Breslow JL (2005) Overexpression of PCSK9 accelerates the degradation of the LDLR in a post-endoplasmic reticulum compartment. Proc Natl Acad Sci U S A 102(6):2069–2074. https://doi.org/10.1073/pnas.0409736102
Ding W, Lin L, Chen B, Dai J (2006) L1 elements, processed pseudogenes and retrogenes in mammalian genomes. IUBMB Life 58(12):677–685. https://doi.org/10.1080/15216540601034856
Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, Harte R, Balasubramanian S, Tanzer A, Diekhans M, Reymond A, Hubbard TJ, Harrow J, Gerstein MB (2012) The GENCODE pseudogene resource. Genome Biol 13(9):R51. https://doi.org/10.1186/gb-2012-13-9-r51
Roberts TC, Morris KV (2013) Not so pseudo anymore: pseudogenes as therapeutic targets. Pharmacogenomics 14(16):2023–2034. https://doi.org/10.2217/pgs.13.172
Lock A, Rutherford K, Harris MA, Hayles J, Oliver SG, Bähler J, Wood V (2019) PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information. Nucleic Acids Res 47(D1):D821–D827. https://doi.org/10.1093/nar/gky961
Harris TW, Arnaboldi V, Cain S, Chan J, Chen WJ, Cho J, Davis P, Gao S, Grove CA, Kishore R, Lee RYN, Muller H-M, Nakamura C, Nuin P, Paulini M, Raciti D, Rodgers FH, Russell M, Schindelman G, Auken KV, Wang Q, Williams G, Wright AJ, Yook K, Howe KL, Schedl T, Stein L, Sternberg PW (2020) WormBase: a modern model organism information resource. Nucleic Acids Res 48(D1):D762–D767. https://doi.org/10.1093/nar/gkz920
Thurmond J, Goodman JL, Strelets VB, Attrill H, Gramates LS, Marygold SJ, Matthews BB, Millburn G, Antonazzo G, Trovisco V, Kaufman TC, Calvi BR, Consortium F (2019) FlyBase 2.0: the next generation. Nucleic Acids Res 47(D1):D759–D765. https://doi.org/10.1093/nar/gky1003
Yates AD, Achuthan P, Akanni W, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Marugán JC, Cummins C, Davidson C, Dodiya K, Fatima R, Gall A, Giron CG, Gil L, Grego T, Haggerty L, Haskell E, Hourlier T, Izuogu OG, Janacek SH, Juettemann T, Kay M, Lavidas I, Le T, Lemos D, Martinez JG, Maurel T, McDowall M, McMahon A, Mohanan S, Moore B, Nuhn M, Oheh DN, Parker A, Parton A, Patricio M, Sakthivel MP, Abdul Salam AI, Schmitt BM, Schuilenburg H, Sheppard D, Sycheva M, Szuba M, Taylor K, Thormann A, Threadgold G, Vullo A, Walts B, Winterbottom A, Zadissa A, Chakiachvili M, Flint B, Frankish A, Hunt SE, IIsley G, Kostadima M, Langridge N, Loveland JE, Martin FJ, Morales J, Mudge JM, Muffato M, Perry E, Ruffier M, Trevanion SJ, Cunningham F, Howe KL, Zerbino DR, Flicek P (2020) Ensembl 2020. Nucleic Acids Res 48(D1):D682–D688. https://doi.org/10.1093/nar/gkz966
Zhang Z, Carriero N, Zheng D, Karro J, Harrison PM, Gerstein M (2006) PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 22(12):1437–1439. https://doi.org/10.1093/bioinformatics/btl116
Baertsch R, Diekhans M, Kent WJ, Haussler D, Brosius J (2008) Retrocopy contributions to the evolution of the human genome. BMC Genomics 9:466. https://doi.org/10.1186/1471-2164-9-466
Frankish A, Harrow J (2014) GENCODE pseudogenes. Methods Mol Biol 1167:129–155. https://doi.org/10.1007/978-1-4939-0835-6_10
Lam HYK, Khurana E, Fang G, Cayting P, Carriero N, Cheung K-H, Gerstein MB (2009) Pseudofam: the pseudogene families database. Nucleic Acids Res 37(Database issue):D738–D743. https://doi.org/10.1093/nar/gkn758
Ezkurdia I, del Pozo A, Frankish A, Rodriguez JM, Harrow J, Ashman K, Valencia A, Tress ML (2012) Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function. Mol Biol Evol 29(9):2265–2283. https://doi.org/10.1093/molbev/mss100
Ji Z, Song R, Regev A, Struhl K (2015) Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins. elife 4:e08890. https://doi.org/10.7554/eLife.08890
Barnes IHA, Ibarra-Soria X, Fitzgerald S, Gonzalez JM, Davidson C, Hardy MP, Manthravadi D, Van Gerven L, Jorissen M, Zeng Z, Khan M, Mombaerts P, Harrow J, Logan DW, Frankish A (2020) Expert curation of the human and mouse olfactory receptor gene repertoires identifies conserved coding regions split across two exons. BMC Genomics 21(1):196. https://doi.org/10.1186/s12864-020-6583-3
Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, Chinwalla A, Conrad DF, Fu Y, Grubert F, Hajirasouliha I, Hormozdiari F, Iakoucheva LM, Iqbal Z, Kang S, Kidd JM, Konkel MK, Korn J, Khurana E, Kural D, Lam HYK, Leng J, Li R, Li Y, Lin C-Y, Luo R, Mu XJ, Nemesh J, Peckham HE, Rausch T, Scally A, Shi X, Stromberg MP, Stütz AM, Urban AE, Walker JA, Wu J, Zhang Y, Zhang ZD, Batzer MA, Ding L, Marth GT, McVean G, Sebat J, Snyder M, Wang J, Ye K, Eichler EE, Gerstein MB, Hurles ME, Lee C, McCarroll SA, Korbel JO, Project G (2011) Mapping copy number variation by population-scale genome sequencing. Nature 470(7332):59–65. https://doi.org/10.1038/nature09708
Fernández JM, de la Torre V, Richardson D, Royo R, Puiggròs M, Moncunill V, Fragkogianni S, Clarke L, Flicek P, Rico D, Torrents D, Carrillo de Santa Pau E, Valencia A, Consortium B (2016) The BLUEPRINT data analysis portal. Cell Syst 3(5):491–495.e495. https://doi.org/10.1016/j.cels.2016.10.021
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigó R, Hubbard TJ (2012) GENCODE: the reference human genome annotation for The ENCODE project. Genome Res 22(9):1760–1774. https://doi.org/10.1101/gr.135350.111
Karczewski KJ, Weisburd B, Thomas B, Solomonson M, Ruderfer DM, Kavanagh D, Hamamsy T, Lek M, Samocha KE, Cummings BB, Birnbaum D, Daly MJ, DG MA, Consortium TEA (2017) The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res 45(D1):D840–D845. https://doi.org/10.1093/nar/gkw971
Consortium G (2013) The Genotype-Tissue Expression (GTEx) project. Nat Genet 45(6):580–585. https://doi.org/10.1038/ng.2653
Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, Bodenmiller B, Campbell P, Carninci P, Clatworthy M, Clevers H, Deplancke B, Dunham I, Eberwine J, Eils R, Enard W, Farmer A, Fugger L, Göttgens B, Hacohen N, Haniffa M, Hemberg M, Kim S, Klenerman P, Kriegstein A, Lein E, Linnarsson S, Lundberg E, Lundeberg J, Majumder P, Marioni JC, Merad M, Mhlanga M, Nawijn M, Netea M, Nolan G, Pe’er D, Phillipakis A, Ponting CP, Quake S, Reik W, Rozenblatt-Rosen O, Sanes J, Satija R, Schumacher TN, Shalek A, Shapiro E, Sharma P, Shin JW, Stegle O, Stratton M, Stubbington MJT, Theis FJ, Uhlen M, van Oudenaarden A, Wagner A, Watt F, Weissman J, Wold B, Xavier R, Yosef N, Participants HCAM (2017) The human cell atlas. elife 6. https://doi.org/10.7554/eLife.27041
Zhang J, Bajari R, Andric D, Gerthoffert F, Lepsa A, Nahal-Bose H, Stein LD, Ferretti V (2019) The international cancer genome consortium data portal. Nat Biotechnol 37(4):367–369. https://doi.org/10.1038/s41587-019-0055-9
Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, Hirst M, Lander ES, Mikkelsen TS, Thomson JA (2010) The NIH roadmap epigenomics mapping consortium. Nat Biotechnol 28(10):1045–1048. https://doi.org/10.1038/nbt1010-1045
Wang Z, Jensen MA, Zenklusen JC (2016) A practical guide to the cancer genome atlas (TCGA). Methods Mol Biol 1418:111–141. https://doi.org/10.1007/978-1-4939-3578-9_6
modENCODE t. http://data.modencode.org. Accessed 20 July 2020
Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS (2012) Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods 9(5):473–476. https://doi.org/10.1038/nmeth.1937
Mudge JM, Harrow J (2015) Creating reference gene annotation for the mouse C57BL6/J genome assembly. Mamm Genome 26(9–10):366–378. https://doi.org/10.1007/s00335-015-9583-x
Frankish A, Diekhans M, Ferreira A-M, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, Barnes I, Berry A, Bignell A, Carbonell Sala S, Chrast J, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Hunt T, Izuogu OG, Lagarde J, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FCP, Parker A, Pei B, Pozo F, Ruffier M, Schmitt BM, Stapleton E, Suner M-M, Sycheva I, Uszczynska-Ratajczak B, Xu J, Yates A, Zerbino D, Zhang Y, Aken B, Choudhary JS, Gerstein M, Guigó R, Hubbard TJP, Kellis M, Paten B, Reymond A, Tress ML, Flicek P (2019) GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47(D1):D766–D773. https://doi.org/10.1093/nar/gky955
Sisu C, Muir P, Frankish A, Fiddes I, Diekhans M, Thybert D, Odom D, Flicek P, Keane T, Hubbard T, Harrow J, Gerstein M (2020) Transcriptional activity and strain-specific history of mouse pseudogenes. Nat Commun 11(1). https://doi.org/10.1038/s41467-020-17157-w
Goodier JL, Ostertag EM, Du K, Kazazian HH Jr (2001) A novel active L1 retrotransposon subfamily in the mouse. Genome Res 11(10):1677–1685. https://doi.org/10.1101/gr.198301
Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, Kazazian HH Jr (2003) Hot L1s account for the bulk of retrotransposition in the human population. Proc Natl Acad Sci U S A 100(9):5280–5285. https://doi.org/10.1073/pnas.0831042100
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Sisu, C. (2021). GENCODE Pseudogenes. In: Poliseno, L. (eds) Pseudogenes. Methods in Molecular Biology, vol 2324. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1503-4_5
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1503-4_5
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1502-7
Online ISBN: 978-1-0716-1503-4
eBook Packages: Springer Protocols