| A graphical model for predicting protein molecular function |
| Full text |
Pdf
(244 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 148
archive
Proceedings of the 23rd international conference on Machine learning
table of contents
Pittsburgh, Pennsylvania
Pages: 297 - 304
Year of Publication: 2006
ISBN:1-59593-383-2
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 0, Downloads (12 Months): 159, Citation Count: 0
|
|
|
ABSTRACT
We present a simple statistical model of molecular function evolution to predict protein function. The model description encodes general knowledge of how molecular function evolves within a phylogenetic tree based on the proteins' sequence. Inputs are a phylogeny for a set of evolutionarily related protein sequences and any available function characterizations for those proteins. Posterior probabilities for each protein are used to predict the molecular function of that protein. We present results from applying our model to three protein families, and compare our prediction results on the extant proteins to other available protein function prediction methods. For the deaminase family, our method achieves 93.9% where related methods BLAST achieves 72.7%, GOtcha achieves 87.9%, and Orthostrapper achieves 72.7% in prediction accuracy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Altschul, S. F. et al. (1990). Basic local alignment search tool. J Mol Biol, 215, 403--410.
|
| |
2
|
Altschul, S. F. et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389--3402.
|
| |
3
|
Ashburner, M. et al. (2002). Gene ontology: Tool for the unification of biology. the gene ontology consortium. Nat Genet, 25, 25--29.
|
| |
4
|
Bateman, A. et al. (2002). The Pfam protein families database. Nucleic Acids Res, 30, 276--280.
|
| |
5
|
Camon, E. et al. (2004). The gene ontology annotation (GOA) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res, 32, 262--266.
|
| |
6
|
Durbin, R. et al. (1999). Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press.
|
| |
7
|
Edgar, R., & Sjolander, K. (2003). SATCHMO: sequence alignment and tree construction using hidden Markov models. Bioinformatics, 19, 1404--1411.
|
| |
8
|
Eisen, J. A. (1998). Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res, 8, 163--167.
|
| |
9
|
Engelhardt, B. E., Jordan, M. I., Muratore, K., & Brenner, S. E. (2005). Protein molecular function prediction by Bayesian phylogenomics. PLoS Comp Biol, 1, e45.
|
| |
10
|
Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. JME, 17, 368--376.
|
| |
11
|
Felsenstein, J. (1989). PHYLIP -- phylogeny inference package (version 32). Cladistics, 5, 164--166.
|
| |
12
|
Goodman, M. et al. (1979). Fitting the gene lineage into its species lineage: a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Zool, 28, 132--168.
|
| |
13
|
Martin, D. M. A. et al. (2004). GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics, 5, 178--195.
|
| |
14
|
Ohno, S. (1972). Evolution by gene duplication. Springer-Verlag.
|
| |
15
|
Ribard, C. et al. (2003). Sub-families of alpha/beta barrel enzymes: a new adenine deaminase family. J Mol Biol, 334, 1117--1131.
|
| |
16
|
Stajich, J. E. et al. (2002). The BioPerl toolkit: Perl modules for the life sciences. Genome Res, 12, 1611--1618.
|
| |
17
|
Storm, C. E., & Sonnhammer, E. L. (2002). Automated ortholog inference from phylogenetic trees and calculation of ortholog reliability. Bioinformatics, 18, 92--99.
|
| |
18
|
Swofford, D. (2001). Paup*: Phylogenetic analysis using parsimony. Sinauer Associates.
|
| |
19
|
Zmasek, C. M., & Eddy, S. R. (2001). A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics, 17, 821--828.
|
| |
20
|
Zmasek, C. M., & Eddy, S. R. (2002). RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics, 3, 14.
|
|