Large-Scale Protein Annotation through Gene Ontology

  1. Hanqing Xie1,3,
  2. Alon Wasserman1,
  3. Zurit Levine2,
  4. Amit Novik2,
  5. Vladimir Grebinskiy1,
  6. Avi Shoshan1, and
  7. Liat Mintz1,3
  1. 1Compugen Inc., Jamesburg, New Jersey 08831, USA; 2Compugen Ltd., Tel Aviv 69512, Israel

Abstract

Recent progress in genomic sequencing, computational biology, and ontology development has presented an opportunity to investigate biological systems from a unique perspective, that is, examining genomes and transcriptomes through the multiple and hierarchical structure of Gene Ontology (GO). We report here our development of GO Engine, a computational platform for GO annotation, and analysis of the resultant GO annotations of human proteins. Protein annotation was centered on sequence homology with GO-annotated proteins and protein domain analysis. Text information analysis and a multiparameter cellular localization predictive tool were also used to increase the annotation accuracy, and to predict novel annotations. The majority of proteins corresponding to full-length mRNA in GenBank, and the majority of proteins in the NR database (nonredundant database of proteins) were annotated with one or more GO nodes in each of the three GO categories. The annotations of GenBank and SWISS-PROT proteins are available to the public at the GO Consortium web site.

Footnotes

  • 3 Corresponding authors.

  • E-mail: han{at}cgen.com; E-mail: liat{at}cgen.com; FAX: (609) 655-5114.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.86902.

    • Received January 8, 2002.
    • Accepted March 15, 2002.
| Table of Contents

Preprint Server