Trends in Genetics
Timing and mechanism of ancient vertebrate genome duplications – the adventure of a hypothesis
Introduction
For some time the differences in morphological complexity between animals have been associated directly with the number of genes. Vertebrates almost consistently have more genes than invertebrates and have unique anatomical structures that are characteristic for their phylum. Did this increasing complexity occur through more genes arising following genome duplication?
According to Ohno [1], gene and especially genome duplications are of enormous importance because they can generate large amounts of raw genetic material in a short time that can be exploited by the mutation and positive selection processes to evolve novel gene function. Based on the genome size of the cephalochordate amphioxus, which is three times as large as the genome of the urochordate (see Glossary) Ciona, Ohno argued in favor of a genome duplication following the divergence of urochordates. Isozyme studies, and the analysis of orthologous genes from amphioxus and Ciona, showed that most genes are present as single copies, whereas the genomes of jawless vertebrates, such as lamprey and hagfish, contained at least two orthologs and mammals contained three orthologs or more [2]. This evidence together with the identification of a single Hox cluster in amphioxus (the invertebrate closest to vertebrates phylogenetically) [3], compared with four clusters in mammals, enabled a refinement of the proposed time of duplication to the period following the split of the cephalochordate and vertebrate lineages and before the emergence of gnathostomes (Figure 1). Based on the apparent stepwise increase in the gene copy-number from invertebrates to jawless vertebrates to mammals, it was suggested that two episodes of complete or whole genome duplication (WGD) occurred [2], one before and one after the jawless fish diverged, which is estimated at 500–430 million years ago (Mya) (i.e. the 2R hypothesis; see Ref. [4] for a summary of proposals for the timing of duplication events).
The identification of three ‘large’ quadrupled regions in the unfinished human genome, namely the major histocompatibility complex (MHC; human chromosome (Hsa) l, 6, 9 and 19), an extended Hox (Hsa 2, 7, 12 and 17) and the fibroblast growth factor receptor (FGFR; Hsa 4, 5, 8 and l0) regions, which included genes duplicated ∼530–738 Mya strongly supported tetraploidy 5, 6, 7, 8, 9. These rounds of duplication could have happened in short succession within 90–106 Mya [10]. Proponents of the 2R hypothesis argued that this short interval could explain the incongruent tree topologies of neighbor genes within the described paralogons [11] (Box 1), whereas opponents quoted it as a proof that these paralogons did not arise through the duplication of an ancestral block. To explain the numerous paralogs in vertebrates, an alternative scenario of continuous mode of small-scale (tandem or segmental) gene duplications was suggested [12].
Before the completion of the human genome, gene estimates were in the range of ∼70 000 for humans (±20 000) and ∼20 000 for invertebrates 12, 13, 14. This fourfold difference and the observed 1:4 relationship between many Drosophila and human genes (1:4 rule) 15, 16, 17 was an additional argument in favor of two rounds of WGD under the assumption that no subsequent gene loss had happened. The estimation that the human genome might contain as few as 25 000 genes 18, 19, 20, 21, 22 signaled that if there had been WGDs, they must have been followed by extensive gene loss; therefore, finding evidence for old duplications might not be as straightforward as originally thought.
What is the evidence for 2R duplications produced from the analysis of the complete human genome and teleost fish genomes? In this article, we will review this evidence in the light of similar data generated from the genome analysis of more recent polyploids such as Arabidopsis and Saccharomyces cerevisiae.
Section snippets
2R genes in vertebrates and the extent of gene loss
According to the 2R hypothesis, each invertebrate gene is expected to have at least four vertebrate orthologs (in keeping with the 1:4 rule). The human genome shares 1308 gene families with the genomes of Caenorhabditis elegans, D. melanogaster and S. cerevisiae, 43.1% of which are single copy genes in these organisms and in humans 23, 24, 25, 26. If yeast is excluded from this comparison the number of families shared between the human genome and the genomes of C. elegans and D. melanogaster
How many vertebrate duplicates date at the origin of vertebrates?
The molecular-clock-based calculation (Box 1) of the age of human duplicates within 191 gene families that have a single invertebrate ortholog (i.e. genes likely to have duplicated on the vertebrate lineage) and the arthropod–chordate divergence estimate of either 833 Myr [38] or 993 Myr [10] showed that most of these human duplicates arose ∼333–583 Mya or 397–695 Mya (Figure 2) 25, 26. The dating of numerous vertebrate gene families (749 vertebrate gene families, 1739 gene-duplication events)
The search for 2R traces in the human genome
Stronger evidence for the type and number of duplication events can be obtained from the presence and arrangement of paralogons in the duplicated genome.
First proof for WGD in vertebrates
Additional Hox clusters have been identified in teleost fish occupying different taxonomic positions (Figure 1). The mapping of Hox clusters and many duplicated genes in zebrafish 49, 50, pufferfish [51] and medaka [52] suggested an extra WGD in ray-finned fish. The analysis of the Fugu genome revealed 159 statistically significant paralogons that contained 544 paralogous gene pairs (3.4 anchor points per block) [32]. Seventy percent of duplicated genes in the these paralogons (that carry 406
Evidence for 2R from early vertebrates
The definitive proof that a more recent WGD occurred in teleost fish has important consequences for the 2R hypothesis because it indicates that WGD and not segmental duplication was the duplication mechanism responsible for the origin of the additional Hox clusters in this clade. Therefore, one could now accept that the Hox clusters are reliable markers of WGDs (Figure 1).
Both hagfish and lamprey genomes have been sampled so far mainly for Hox genes. Lamprey has at least four Hox clusters 53, 54
Concluding remarks
Although polyploidy is a drastic event for a genome, it is not as rare. It is has long been known that natural polyploids are widespread in animal and plant genomes: 50% to >70% of angiosperms are thought to have experienced chromosome doubling [64]. Many amphibian [65] and fish [66] species are known for frequent recent polyploidy. Furthermore, the same amphibian species can be found with various ploidy levels [67]. Although the genome analysis of representative organisms of several of the
What next?
The complete genome sequence of lamprey or hagfish will help to resolve the timing of the duplications. The definitive answer to whether there were one or two rounds of ancient vertebrate genome duplications primarily rests in the upcoming amphioxus genome, which will serve as an unduplicated reference genome. Importantly, in addition to the complete sequence of these genomes, high-resolution genomic maps that will enable genes to be anchored to the chromosomes are required to tackle the
Acknowledgements
We thank Steffen Hennig, Detlef Groth, James Adjaye and especially Hans Lehrach for stimulating discussions. This work was supported by the Max-Planck Gesellschaft zur Förderung der Wissenschaften e.v.
Glossary
- (AB)(CD) topology measure:
- the nodes of the phylogenetic tree of four duplicates generated from two duplication events should have the (AB)(CD) topology where the dates of duplication for the (AB) and (CD) nodes are the same. Neighbor genes within paralogons that have the same topology are assumed to have been generated through the same event.
- Agnathans:
- jawless vertebrates.
- Aneuploidy:
- the loss or addition of one or more specific chromosomes to the normal set of chromosomes of an organism (e.g. a
References (82)
- et al.
Eukaryote genome duplication – where’s the evidence?
Curr. Opin. Genet. Dev.
(1998) Paralogy mapping: identification of a region in the human MHC triplicated onto human chromosomes 1 and 9 allows the prediction and isolation of novel PBX and NOTCH loci
Genomics
(1996)Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse
Genomics
(1993)- et al.
Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions
Curr. Opin. Cell Biol.
(1999) Gen(om)e duplications in the evolution of early vertebrates
Curr. Opin. Genet. Dev.
(1996)Vertebrate evolution by interspecific hybridization – are we polyploid?
FEBS Lett.
(1997)- et al.
Evidence for 14 homeobox gene clusters in human genome ancestry
Curr. Biol.
(2000) - et al.
Otx expression during lamprey embryogenesis provides insights into the evolution of the vertebrate head and jaw
Dev. Biol.
(1999) Evidence for independent Hox gene duplications in the hagfish lineage: a PCR-based gene inventory of Eptatretus stoutii
Mol. Phylogenet. Evol.
(2004)- et al.
Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision
Trends Genet.
(2004)
Molecular timescales and the fossil record: a paleontological perspective
Trends Genet.
Dealing with saturation at the amino acid level: a case study based on anciently duplicated zebrafish genes
Gene
Genomic clocks and evolutionary timescales
Evolution by Gene Duplication
Gene duplications and the origins of vertebrate development
Dev. Suppl.
Archetypal organization of the amphioxus Hox gene cluster
Nature
Gene loss and gain in the evolution of vertebrates
Dev.
Ancient chromosomal duplication involving the major histocompatibility complex
Seikagaku
Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution
Mol. Biol. Evol.
Evolutionary patterns of gene families generated in the early stage of vertebrates
J. Mol. Evol.
Evidence in favour of ancient octaploidy in the vertebrate genome
Biochem. Soc. Trans.
Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1
Mol. Biol. Evol.
CpG islands
EXS
How many genes in the human genome?
Nat. Genet.
Finishing the euchromatic sequence of the human genome
Nature
The sequence of the human genome
Science
Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence
Nat. Genet.
The DNA sequence of human chromosome 22
Nature
The draft sequences. Filling in the gaps
Nature
Initial sequencing and analysis of the human genome
Nature
Ancient genome duplications did not structure the human Hox-bearing chromosomes
Genome Res.
New evidence for genome-wide duplications at the origin of vertebrates using an amphioxus gene set and completed animal genomes
Genome Res.
Extensive genomic duplication during early chordate evolution
Nat. Genet.
The evolutionary demography of duplicate genes
J. Struct. Funct. Genomics
Molecular evidence for an ancient duplication of the entire yeast genome
Nature
Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae
Nature
The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome
Science
A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome
Genome Res.
Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates
Proc. Natl. Acad. Sci. U. S. A.
Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype
Nature
Pattern and timing of gene duplication in animal genomes
Genome Res.
Cited by (183)
The evolution and immunomodulatory role of Zc3h12 proteins in zebrafish (Danio rerio)
2023, International Journal of Biological MacromoleculesEvolutionary analysis of the Musashi family: What can it tell us about Zika?
2020, Infection, Genetics and EvolutionCitation Excerpt :The vast majority of RBPs in humans and mice appear to be owing to further duplications of ancestral RBP-coding genes in Vertebrata lineage. That was possible because of whole-genome duplications occurred throughout early vertebrate evolution (Panopoulou & Poustka, 2005). Thus, the RBP types present in modern metazoans were already present in the last common ancestor of metazoans; consequently, the collection of RBPs has been firmly maintained during metazoan evolution.
The in silico characterization of neutral alpha-glucosidase C (GANC) and its evolution from GANAB
2020, GeneCitation Excerpt :GIIα gene is positioned on chromosome 11 and ganc on chromosome 15. The genes could have changed their initial neighboring position due to a chromosome rearrangement or during one of the two proposed whole genome duplication events which probably took place in early vertebrates (Ohno et al., 1968; Dehal and Boore 2005; Panopoulou and Poustka 2005; Hufton et al., 2008). The human ganc gene is positioned between genes for transmembrane protein 87A (THEM87A) and calpain-3 (CAPN3).
Genome-wide identification, evolution of ATF/CREB family and their expression in Nile tilapia
2019, Comparative Biochemistry and Physiology Part - B: Biochemistry and Molecular Biology