Context-Specific Independence Mixture Modelling for Protein Families

Georgi, Benjamin; Schultz, Jörg; Schliep, Alexander

doi:10.1007/978-3-540-74976-9_11

Benjamin Georgi¹,
Jörg Schultz² &
Alexander Schliep¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4702))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3473 Accesses
1 Citations

Abstract

Protein families can be divided into subgroups with functional differences. The analysis of these subgroups and the determination of which residues convey substrate specificity is a central question in the study of these families. We present a clustering procedure using the context-specific independence mixture framework using a Dirichlet mixture prior for simultaneous inference of subgroups and prediction of specificity determining residues based on multiple sequence alignments of protein families. Application of the method on several well studied families revealed a good clustering performance and ample biological support for the predicted positions. The software we developed to carry out this analysis PyMix - the Python mixture package is available from http://www.algorithmics.molgen.mpg.de/pymix.html.

Download to read the full chapter text

Chapter PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Barash, Y., Friedman, N.: Context-specific bayesian clustering for gene expression data. J. Comput. Biol. 9(2), 169–191 (2002)
Article Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Non-Linear Anal. 20(3), 267–272 (1999)
MATH Google Scholar
Boutilier, C., Friedman, N., Goldszmidt, M., Koller, D.: Context-specific independence in Bayesian networks. In: Uncertainty in Artificial Intelligence, pp. 115–123 (1996)
Google Scholar
Chakrabarti, S., Lanczycki, C.J.: Analysis and prediction of functionally important sites in proteins. Protein Sci. 16(1), 4–13 (2007)
Article Google Scholar
Chickering, D.M., Heckerman, D.: Efficient approximations for the marginal likelihood of bayesian networks with hidden variables. Mach. Learn. 29(2-3), 181–212 (1997)
Article MATH Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 1–38 (1977)
Google Scholar
Friedman, N., Goldszmidt, M.: Learning bayesian networks with local structure. In: Proceedings of the NATO Advanced Study Institute on Learning in graphical models, pp. 421–459. Kluwer Academic Publishers, Norwell, MA, USA (1998)
Google Scholar
Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, D., Bairoch, A.: ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31(13), 3784–3788 (2003)
Article Google Scholar
Georgi, B., Schliep, A.: Context-specific independence mixture modeling for positional weight matrices. Bioinformatics 22(14), 166–173 (2006)
Article Google Scholar
Georgi, B., Spence, M.A., Flodman, P., Schliep, A.: Mixture model based group inference in fused genotype and phenotype data. In: Studies in Classification, Data Analysis, and Knowledge Organization, Springer, Heidelberg (2007)
Google Scholar
Hanks, S.K., Quinn, A.M., Hunter, T.: The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science 241(4861), 42–52 (1988)
Article Google Scholar
Hannenhalli, S., Russell, R.B.: Analysis and prediction of functional sub-types from protein sequence alignments. J. Mol. Biol. 303(1), 61–76 (2000)
Article Google Scholar
Hunter, T.: Protein kinase classification. Methods Enzymol 200, 3–37 (1991)
Article Google Scholar
Jones, S., Thornton, J.M.: Searching for functional sites in protein structures. Curr. Opin. Chem. Biol. 8(1), 3–7 (2004)
Article Google Scholar
Lazareva-Ulitsky, B., Diemer, K., Thomas, P.D.: On the quality of tree-based protein classification. Bioinformatics 21(9), 1876–1890 (2005)
Article Google Scholar
Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257(2), 342–358 (1996)
Article Google Scholar
Liu, Y., Ruoho, A.E., Rao, V.D., Hurley, J.H.: Catalytic mechanism of the adenylyl and guanylyl cyclases: modeling and mutational analysis. Proc. Natl. Acad. Sci. USA 94(25), 13414–13419 (1997)
Article Google Scholar
Livingstone, C.D., Barton, G.J.: Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput. Appl. Biosci. 9(6), 745–756 (1993)
Google Scholar
Manning, G., Whyte, D.B., Martinez, R., Hunter, T., Sudarsanam, S.: The protein kinase complement of the human genome. Science 298(5600), 1912–1934 (2002)
Article Google Scholar
McLachlan, G.J., Peel, D.: Finite Mixture Models. John Wiley & Sons, Chichester (2000)
MATH Google Scholar
Mirny, L.A., Gelfand, M.S.: Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors. J. Mol. Biol. 321(1), 7–20 (2002)
Article Google Scholar
Pei, J., Cai, W., Kinch, L.N., Grishin, N.V.: Prediction of functional specificity determinants from protein sequences using log-likelihood ratios. Bioinformatics 22(2), 164–171 (2006)
Article Google Scholar
Sjolander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I.S, Haussler, D.: Dirichlet mixtures: A method for improving detection of weak but significant protein sequence homology. Technical report, University of California at Santa Cruz, Santa Cruz, CA, USA (1996)
Google Scholar
Smith, C.M., Shindyalov, I.N., Veretnik, S., Gribskov, M., Taylor, S.S., Ten Eyck, L.F., Bourne, P.E.: The protein kinase resource. Trends Biochem. Sci. 22(11), 444–446 (1997)
Article Google Scholar
Wicker, N., Perrin, G.R., Thierry, J.C., Poch, O.: Secator: a program for inferring protein subfamilies from phylogenetic trees. Mol. Biol. Evol. 18(8), 1435–1441 (2001)
Google Scholar
Yu, G., Park, B.-H., Chandramohan, P., Munavalli, R., Geist, A., Samatova, N.F.: In silico discovery of enzyme-substrate specificity-determining residue clusters. J. Mol. Biol. 352(5), 1105–1117 (2005)
Article Google Scholar
Zhang, G., Liu, Y., Ruoho, A.E., Hurley, J.H.: Structure of the adenylyl cyclase catalytic core. Nature 386(6622), 247–253 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Molecular Genetics, Dept. of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin, Germany
Benjamin Georgi & Alexander Schliep
Universität Würzburg, Dept. of Bioinformatics, 97074 Wuerzburg, Germany
Jörg Schultz

Authors

Benjamin Georgi
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Schultz
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Schliep
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joost N. Kok Jacek Koronacki Ramon Lopez de Mantaras Stan Matwin Dunja Mladenič Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Georgi, B., Schultz, J., Schliep, A. (2007). Context-Specific Independence Mixture Modelling for Protein Families. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds) Knowledge Discovery in Databases: PKDD 2007. PKDD 2007. Lecture Notes in Computer Science(), vol 4702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74976-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-74976-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74975-2
Online ISBN: 978-3-540-74976-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics