Journal of Molecular Biology
Volume 316, Issue 3, 22 February 2002, Pages 409-419
Journal home page for Journal of Molecular Biology

Communication
A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution1

https://doi.org/10.1006/jmbi.2001.5343Get rights and content

Abstract

We surveyed the sequenced Saccharomyces cerevisiae genome (strain S288C) comprehensively for open reading frames (ORFs) that could encode full-length proteins but contain obvious mid-sequence disablements (frameshifts or premature stop codons). These pseudogenic features are termed disabled ORFs (dORFs). Using homology to annotated yeast ORFs and non-yeast proteins plus a simple region extension procedure, we have found 183 dORFs. Combined with the 38 existing annotations for potential dORFs, we have a total pool of up to 221 dORFs, corresponding to less than ∼3 % of the proteome. Additionally, we found 20 pairs of annotated ORFs for yeast that could be merged into a single ORF (termed a mORF) by read-through of the intervening stop codon, and may comprise a complete ORF in other yeast strains. Focussing on a core pool of 98 dORFs with a verifying protein homology, we find that most dORFs are substantially decayed, with ∼90 % having two or more disablements, and ∼60 % having four or more. dORFs are much more yeast-proteome specific than live yeast genes (having about half the chance that they are related to a non-yeast protein). They show a dramatically increased density at the telomeres of chromosomes, relative to genes. A microarray study shows that some dORFs are expressed even though they carry multiple disablements, and thus may be more resistant to nonsense-mediated decay. Many of the dORFs may be involved in responding to environmental stresses, as the largest functional groups include growth inhibition, flocculation, and the SRP/TIP1 family. Our results have important implications for proteome evolution. The characteristics of the dORF population suggest the sorts of genes that are likely to fall in and out of usage (and vary in copy number) in a strain-specific way and highlight the role of subtelomeric regions in engendering this diversity. Our results also have important implications for the effects of the [PSI+] prion. The dORFs disabled by only a single stop and the mORFs (together totalling 35) provide an estimate for the extent of the sequence population that can be resurrected readily through the demonstrated ability of the [PSI+] prion to cause nonsense-codon read-through. Also, the dORFs and mORFs that we find have properties (e.g. growth inhibition, flocculation, vanadate resistance, stress response) that are potentially related to the ability of [PSI+] to engender substantial phenotypic variation in yeast strains under different environmental conditions. (See genecensus.org/pseudogene for further information.)

Section snippets

Finding dORFs in the sequenced yeast genome

Since the full extent of the dORF complement in yeast is not known at present, here we have defined the yeast dORF pool using a simple homology-based procedure. As described in detail in Figure 1(a), the yeast genome was scanned for significant protein homologies that contain at least one disablement and that do not rely on alignment to a previously annotated ORF in the genomic DNA. That is, if the dORF entails an annotated ORF, the disabled extension to the ORF arises from a significant span

Properties of yeast dORFs

We examined the core pool of dORFs as follows: (1) their distribution of disablements; (2) their homology trends; (3) their prevalent families; and (4) their chromosomal distribution.

Expression of dORFs

We tested a small random sample of 11 dORFs for expression (Figure 2(d)). Four of these showed appreciable expression, even though one has two disablements, and the other three have five or more disablements. Two of these four dORFs are subtelomeric (within 20 kb from chromosome ends), and homologous to putative hypothetical ORFs, representing dORF families of nine or more members. The other two are single dORFs with moderate sequence similarity for two annotated ORFs, both with five or more

A dynamically evolving subtelomeric subproteome and its role in strain-specific variation

The total pool of dORFs and pseudogenic fragments corresponds to only a very small percentage of the total annotated proteome (∼3%). However, the distribution of these dORFs, both in terms of homology and chromosomal position, details an important perspective on yeast proteome evolution.

In the present study, we have found that dORFs are half as likely to be related to a non-yeast protein (∼40% of dORFs) as to the average known yeast protein (80 % of annotated ORFs). This comparison implies that

Website

The dORF annotation data and sequences are available at the website http://genecensus.org/pseudogene (or http://bioinfo.mbb.yale.edu/ genome/pseudogene).

Acknowledgements

We thank Tricia Serio and Zhaolei Zhang for comments on the manuscript. A.K. is supported by a postdoctoral fellowship from the American Cancer Society. M.G. acknowledges support from the NIH protein structure initiative (P50 grant GM62413-01).

References (40)

  • E.F. Vanin

    Processed pseudogenescharacteristics and evolution

    Annu. Rev. Genet.

    (1985)
  • S.G. Andersson et al.

    The genome sequence of Rickettsia prowazekii and the origin of mitochondria

    Nature

    (1998)
  • J. Parkhill et al.

    Genome sequence of Yersinia pestis, the causative agent of plague

    Nature

    (2001)
  • I. Dunham et al.

    The DNA sequence of human chromosome 22

    Nature

    (1999)
  • M. Hattori et al.

    The DNA sequence of human chromosome 21. The chromosome 21 mapping and sequencing consortium

    Nature

    (2000)
  • P.M. Harrison et al.

    Digging for dead genesan analysis of the characteristics and distribution of the pseudogene population in the C. elegans genome

    Nucl. Acids Res.

    (2001)
  • P.M. Harrison et al.

    Molecular fossils in the human genomeIdentification and analysis of the pseudogenes on chromosomes 21 and 22

    Genome Res.

    (2002)
  • S.T. Cole et al.

    Massive gene decay in the leprosy bacillus

    Nature

    (2001)
  • C. Esnault et al.

    Human LINE retrotransposons generate processed pseudogenes

    Nature Genets.

    (2000)
  • J.M. Cherry et al.

    SGDSaccharomyces Genome Database

    Nucl. Acids Res.

    (1998)
  • Cited by (0)

    1

    Edited by F. Cohen

    View full text