Trends in Genetics
Volume 21, Issue 10, October 2005, Pages 559-567
Journal home page for Trends in Genetics

Timing and mechanism of ancient vertebrate genome duplications – the adventure of a hypothesis

https://doi.org/10.1016/j.tig.2005.08.004Get rights and content

Complete genome doubling has long-term consequences for the genome structure and the subsequent evolution of an organism. It has been suggested that two genome duplications occurred at the origin of vertebrates (known as the 2R hypothesis). However, there has been considerable debate as to whether these were two successive duplications, or whether a single duplication occurred, followed by large-scale segmental duplications. In this article, we review and compare the evidence for the 2R duplications from vertebrate genomes with similar data from other more recent polyploids.

Introduction

For some time the differences in morphological complexity between animals have been associated directly with the number of genes. Vertebrates almost consistently have more genes than invertebrates and have unique anatomical structures that are characteristic for their phylum. Did this increasing complexity occur through more genes arising following genome duplication?

According to Ohno [1], gene and especially genome duplications are of enormous importance because they can generate large amounts of raw genetic material in a short time that can be exploited by the mutation and positive selection processes to evolve novel gene function. Based on the genome size of the cephalochordate amphioxus, which is three times as large as the genome of the urochordate (see Glossary) Ciona, Ohno argued in favor of a genome duplication following the divergence of urochordates. Isozyme studies, and the analysis of orthologous genes from amphioxus and Ciona, showed that most genes are present as single copies, whereas the genomes of jawless vertebrates, such as lamprey and hagfish, contained at least two orthologs and mammals contained three orthologs or more [2]. This evidence together with the identification of a single Hox cluster in amphioxus (the invertebrate closest to vertebrates phylogenetically) [3], compared with four clusters in mammals, enabled a refinement of the proposed time of duplication to the period following the split of the cephalochordate and vertebrate lineages and before the emergence of gnathostomes (Figure 1). Based on the apparent stepwise increase in the gene copy-number from invertebrates to jawless vertebrates to mammals, it was suggested that two episodes of complete or whole genome duplication (WGD) occurred [2], one before and one after the jawless fish diverged, which is estimated at 500–430 million years ago (Mya) (i.e. the 2R hypothesis; see Ref. [4] for a summary of proposals for the timing of duplication events).

The identification of three ‘large’ quadrupled regions in the unfinished human genome, namely the major histocompatibility complex (MHC; human chromosome (Hsa) l, 6, 9 and 19), an extended Hox (Hsa 2, 7, 12 and 17) and the fibroblast growth factor receptor (FGFR; Hsa 4, 5, 8 and l0) regions, which included genes duplicated ∼530–738 Mya strongly supported tetraploidy 5, 6, 7, 8, 9. These rounds of duplication could have happened in short succession within 90–106 Mya [10]. Proponents of the 2R hypothesis argued that this short interval could explain the incongruent tree topologies of neighbor genes within the described paralogons [11] (Box 1), whereas opponents quoted it as a proof that these paralogons did not arise through the duplication of an ancestral block. To explain the numerous paralogs in vertebrates, an alternative scenario of continuous mode of small-scale (tandem or segmental) gene duplications was suggested [12].

Before the completion of the human genome, gene estimates were in the range of ∼70 000 for humans (±20 000) and ∼20 000 for invertebrates 12, 13, 14. This fourfold difference and the observed 1:4 relationship between many Drosophila and human genes (1:4 rule) 15, 16, 17 was an additional argument in favor of two rounds of WGD under the assumption that no subsequent gene loss had happened. The estimation that the human genome might contain as few as 25 000 genes 18, 19, 20, 21, 22 signaled that if there had been WGDs, they must have been followed by extensive gene loss; therefore, finding evidence for old duplications might not be as straightforward as originally thought.

What is the evidence for 2R duplications produced from the analysis of the complete human genome and teleost fish genomes? In this article, we will review this evidence in the light of similar data generated from the genome analysis of more recent polyploids such as Arabidopsis and Saccharomyces cerevisiae.

Section snippets

2R genes in vertebrates and the extent of gene loss

According to the 2R hypothesis, each invertebrate gene is expected to have at least four vertebrate orthologs (in keeping with the 1:4 rule). The human genome shares 1308 gene families with the genomes of Caenorhabditis elegans, D. melanogaster and S. cerevisiae, 43.1% of which are single copy genes in these organisms and in humans 23, 24, 25, 26. If yeast is excluded from this comparison the number of families shared between the human genome and the genomes of C. elegans and D. melanogaster

How many vertebrate duplicates date at the origin of vertebrates?

The molecular-clock-based calculation (Box 1) of the age of human duplicates within 191 gene families that have a single invertebrate ortholog (i.e. genes likely to have duplicated on the vertebrate lineage) and the arthropod–chordate divergence estimate of either 833 Myr [38] or 993 Myr [10] showed that most of these human duplicates arose ∼333–583 Mya or 397–695 Mya (Figure 2) 25, 26. The dating of numerous vertebrate gene families (749 vertebrate gene families, 1739 gene-duplication events)

The search for 2R traces in the human genome

Stronger evidence for the type and number of duplication events can be obtained from the presence and arrangement of paralogons in the duplicated genome.

First proof for WGD in vertebrates

Additional Hox clusters have been identified in teleost fish occupying different taxonomic positions (Figure 1). The mapping of Hox clusters and many duplicated genes in zebrafish 49, 50, pufferfish [51] and medaka [52] suggested an extra WGD in ray-finned fish. The analysis of the Fugu genome revealed 159 statistically significant paralogons that contained 544 paralogous gene pairs (3.4 anchor points per block) [32]. Seventy percent of duplicated genes in the these paralogons (that carry 406

Evidence for 2R from early vertebrates

The definitive proof that a more recent WGD occurred in teleost fish has important consequences for the 2R hypothesis because it indicates that WGD and not segmental duplication was the duplication mechanism responsible for the origin of the additional Hox clusters in this clade. Therefore, one could now accept that the Hox clusters are reliable markers of WGDs (Figure 1).

Both hagfish and lamprey genomes have been sampled so far mainly for Hox genes. Lamprey has at least four Hox clusters 53, 54

Concluding remarks

Although polyploidy is a drastic event for a genome, it is not as rare. It is has long been known that natural polyploids are widespread in animal and plant genomes: 50% to >70% of angiosperms are thought to have experienced chromosome doubling [64]. Many amphibian [65] and fish [66] species are known for frequent recent polyploidy. Furthermore, the same amphibian species can be found with various ploidy levels [67]. Although the genome analysis of representative organisms of several of the

What next?

The complete genome sequence of lamprey or hagfish will help to resolve the timing of the duplications. The definitive answer to whether there were one or two rounds of ancient vertebrate genome duplications primarily rests in the upcoming amphioxus genome, which will serve as an unduplicated reference genome. Importantly, in addition to the complete sequence of these genomes, high-resolution genomic maps that will enable genes to be anchored to the chromosomes are required to tackle the

Acknowledgements

We thank Steffen Hennig, Detlef Groth, James Adjaye and especially Hans Lehrach for stimulating discussions. This work was supported by the Max-Planck Gesellschaft zur Förderung der Wissenschaften e.v.

Glossary

(AB)(CD) topology measure:
the nodes of the phylogenetic tree of four duplicates generated from two duplication events should have the (AB)(CD) topology where the dates of duplication for the (AB) and (CD) nodes are the same. Neighbor genes within paralogons that have the same topology are assumed to have been generated through the same event.
Agnathans:
jawless vertebrates.
Aneuploidy:
the loss or addition of one or more specific chromosomes to the normal set of chromosomes of an organism (e.g. a

References (82)

  • R.R. Reisz et al.

    Molecular timescales and the fossil record: a paleontological perspective

    Trends Genet.

    (2004)
  • Y. Van de Peer

    Dealing with saturation at the amino acid level: a case study based on anciently duplicated zebrafish genes

    Gene

    (2002)
  • S. Blair Hedges et al.

    Genomic clocks and evolutionary timescales

    (2003)
  • S. Ohno

    Evolution by Gene Duplication

    (1970)
  • P.W. Holland

    Gene duplications and the origins of vertebrate development

    Dev. Suppl.

    (1994)
  • J. Garcia-Fernandez et al.

    Archetypal organization of the amphioxus Hox gene cluster

    Nature

    (1994)
  • F.H. Ruddle

    Gene loss and gain in the evolution of vertebrates

    Dev.

    (1994)
  • M. Kasahara

    Ancient chromosomal duplication involving the major histocompatibility complex

    Seikagaku

    (1996)
  • M.J. Pebusque

    Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution

    Mol. Biol. Evol.

    (1998)
  • Y. Wang et al.

    Evolutionary patterns of gene families generated in the early stage of vertebrates

    J. Mol. Evol.

    (2000)
  • T.J. Gibson et al.

    Evidence in favour of ancient octaploidy in the vertebrate genome

    Biochem. Soc. Trans.

    (2000)
  • A.L. Hughes

    Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1

    Mol. Biol. Evol.

    (1998)
  • F. Antequera et al.

    CpG islands

    EXS

    (1993)
  • C. Fields

    How many genes in the human genome?

    Nat. Genet.

    (1994)
  • Finishing the euchromatic sequence of the human genome

    Nature

    (2004)
  • J.C. Venter

    The sequence of the human genome

    Science

    (2001)
  • H. Roest Crollius

    Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence

    Nat. Genet.

    (2000)
  • I. Dunham

    The DNA sequence of human chromosome 22

    Nature

    (1999)
  • P. Bork et al.

    The draft sequences. Filling in the gaps

    Nature

    (2001)
  • E.S. Lander

    Initial sequencing and analysis of the human genome

    Nature

    (2001)
  • A.L. Hughes

    Ancient genome duplications did not structure the human Hox-bearing chromosomes

    Genome Res.

    (2001)
  • G. Panopoulou

    New evidence for genome-wide duplications at the origin of vertebrates using an amphioxus gene set and completed animal genomes

    Genome Res.

    (2003)
  • A. McLysaght

    Extensive genomic duplication during early chordate evolution

    Nat. Genet.

    (2002)
  • M. Lynch et al.

    The evolutionary demography of duplicate genes

    J. Struct. Funct. Genomics

    (2003)
  • K.H. Wolfe et al.

    Molecular evidence for an ancient duplication of the entire yeast genome

    Nature

    (1997)
  • M. Kellis

    Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae

    Nature

    (2004)
  • F.S. Dietrich

    The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome

    Science

    (2004)
  • G. Blanc

    A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome

    Genome Res.

    (2003)
  • K. Vandepoele

    Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates

    Proc. Natl. Acad. Sci. U. S. A.

    (2004)
  • O. Jaillon

    Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype

    Nature

    (2004)
  • R. Friedman et al.

    Pattern and timing of gene duplication in animal genomes

    Genome Res.

    (2001)
  • Cited by (183)

    • Evolutionary analysis of the Musashi family: What can it tell us about Zika?

      2020, Infection, Genetics and Evolution
      Citation Excerpt :

      The vast majority of RBPs in humans and mice appear to be owing to further duplications of ancestral RBP-coding genes in Vertebrata lineage. That was possible because of whole-genome duplications occurred throughout early vertebrate evolution (Panopoulou & Poustka, 2005). Thus, the RBP types present in modern metazoans were already present in the last common ancestor of metazoans; consequently, the collection of RBPs has been firmly maintained during metazoan evolution.

    • The in silico characterization of neutral alpha-glucosidase C (GANC) and its evolution from GANAB

      2020, Gene
      Citation Excerpt :

      GIIα gene is positioned on chromosome 11 and ganc on chromosome 15. The genes could have changed their initial neighboring position due to a chromosome rearrangement or during one of the two proposed whole genome duplication events which probably took place in early vertebrates (Ohno et al., 1968; Dehal and Boore 2005; Panopoulou and Poustka 2005; Hufton et al., 2008). The human ganc gene is positioned between genes for transmembrane protein 87A (THEM87A) and calpain-3 (CAPN3).

    • Genome-wide identification, evolution of ATF/CREB family and their expression in Nile tilapia

      2019, Comparative Biochemistry and Physiology Part - B: Biochemistry and Molecular Biology
    View all citing articles on Scopus
    View full text