Gene inactivation and its implications for annotation in the era of personal genomics
- Suganthi Balasubramanian1,
- Lukas Habegger2,
- Adam Frankish3,
- Daniel G. MacArthur3,
- Rachel Harte4,
- Chris Tyler-Smith3,
- Jennifer Harrow3 and
- Mark Gerstein1,2,5,6
- 1Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA;
- 2Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA;
- 3The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom;
- 4Department of Biomolecular Engineering, University of California at Santa Cruz, Santa Cruz, California 95064, USA;
- 5Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA
Abstract
The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.
Keywords
Footnotes
-
↵6 Corresponding author.
E-MAIL mark.gerstein{at}yale.edu; FAX (203) 432-6946.
-
Article is online at http://www.genesdev.org/cgi/doi/10.1101/gad.1968411.
- Copyright © 2011 by Cold Spring Harbor Laboratory Press
Freely available online through the Genes & Development Open Access option.