ACM Home Page
Please provide us with feedback. Feedback
A graphical model for predicting protein molecular function
Full text pdf formatPdf (244 KB)
Source ACM International Conference Proceeding Series; Vol. 148 archive
Proceedings of the 23rd international conference on Machine learning table of contents
Pittsburgh, Pennsylvania
Pages: 297 - 304  
Year of Publication: 2006
ISBN:1-59593-383-2
Authors
Barbara E. Engelhardt  University of California, Berkeley, CA
Michael I. Jordan  University of California, Berkeley, CA
Steven E. Brenner  University of California, Berkeley, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 0,   Downloads (12 Months): 159,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1143844.1143882
What is a DOI?

ABSTRACT

We present a simple statistical model of molecular function evolution to predict protein function. The model description encodes general knowledge of how molecular function evolves within a phylogenetic tree based on the proteins' sequence. Inputs are a phylogeny for a set of evolutionarily related protein sequences and any available function characterizations for those proteins. Posterior probabilities for each protein are used to predict the molecular function of that protein. We present results from applying our model to three protein families, and compare our prediction results on the extant proteins to other available protein function prediction methods. For the deaminase family, our method achieves 93.9% where related methods BLAST achieves 72.7%, GOtcha achieves 87.9%, and Orthostrapper achieves 72.7% in prediction accuracy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Altschul, S. F. et al. (1990). Basic local alignment search tool. J Mol Biol, 215, 403--410.
 
2
Altschul, S. F. et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389--3402.
 
3
Ashburner, M. et al. (2002). Gene ontology: Tool for the unification of biology. the gene ontology consortium. Nat Genet, 25, 25--29.
 
4
Bateman, A. et al. (2002). The Pfam protein families database. Nucleic Acids Res, 30, 276--280.
 
5
Camon, E. et al. (2004). The gene ontology annotation (GOA) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res, 32, 262--266.
 
6
Durbin, R. et al. (1999). Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press.
 
7
Edgar, R., & Sjolander, K. (2003). SATCHMO: sequence alignment and tree construction using hidden Markov models. Bioinformatics, 19, 1404--1411.
 
8
Eisen, J. A. (1998). Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res, 8, 163--167.
 
9
Engelhardt, B. E., Jordan, M. I., Muratore, K., & Brenner, S. E. (2005). Protein molecular function prediction by Bayesian phylogenomics. PLoS Comp Biol, 1, e45.
 
10
Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. JME, 17, 368--376.
 
11
Felsenstein, J. (1989). PHYLIP -- phylogeny inference package (version 32). Cladistics, 5, 164--166.
 
12
Goodman, M. et al. (1979). Fitting the gene lineage into its species lineage: a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Zool, 28, 132--168.
 
13
Martin, D. M. A. et al. (2004). GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics, 5, 178--195.
 
14
Ohno, S. (1972). Evolution by gene duplication. Springer-Verlag.
 
15
Ribard, C. et al. (2003). Sub-families of alpha/beta barrel enzymes: a new adenine deaminase family. J Mol Biol, 334, 1117--1131.
 
16
Stajich, J. E. et al. (2002). The BioPerl toolkit: Perl modules for the life sciences. Genome Res, 12, 1611--1618.
 
17
Storm, C. E., & Sonnhammer, E. L. (2002). Automated ortholog inference from phylogenetic trees and calculation of ortholog reliability. Bioinformatics, 18, 92--99.
 
18
Swofford, D. (2001). Paup*: Phylogenetic analysis using parsimony. Sinauer Associates.
 
19
Zmasek, C. M., & Eddy, S. R. (2001). A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics, 17, 821--828.
 
20
Zmasek, C. M., & Eddy, S. R. (2002). RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics, 3, 14.

Collaborative Colleagues:
Barbara E. Engelhardt: colleagues
Michael I. Jordan: colleagues
Steven E. Brenner: colleagues