Abstract
Protein families can be divided into subgroups with functional differences. The analysis of these subgroups and the determination of which residues convey substrate specificity is a central question in the study of these families. We present a clustering procedure using the context-specific independence mixture framework using a Dirichlet mixture prior for simultaneous inference of subgroups and prediction of specificity determining residues based on multiple sequence alignments of protein families. Application of the method on several well studied families revealed a good clustering performance and ample biological support for the predicted positions. The software we developed to carry out this analysis PyMix - the Python mixture package is available from http://www.algorithmics.molgen.mpg.de/pymix.html.
Chapter PDF
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Barash, Y., Friedman, N.: Context-specific bayesian clustering for gene expression data. J. Comput. Biol. 9(2), 169–191 (2002)
Biernacki, C., Celeux, G., Govaert, G.: An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Non-Linear Anal. 20(3), 267–272 (1999)
Boutilier, C., Friedman, N., Goldszmidt, M., Koller, D.: Context-specific independence in Bayesian networks. In: Uncertainty in Artificial Intelligence, pp. 115–123 (1996)
Chakrabarti, S., Lanczycki, C.J.: Analysis and prediction of functionally important sites in proteins. Protein Sci. 16(1), 4–13 (2007)
Chickering, D.M., Heckerman, D.: Efficient approximations for the marginal likelihood of bayesian networks with hidden variables. Mach. Learn. 29(2-3), 181–212 (1997)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 1–38 (1977)
Friedman, N., Goldszmidt, M.: Learning bayesian networks with local structure. In: Proceedings of the NATO Advanced Study Institute on Learning in graphical models, pp. 421–459. Kluwer Academic Publishers, Norwell, MA, USA (1998)
Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, D., Bairoch, A.: ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31(13), 3784–3788 (2003)
Georgi, B., Schliep, A.: Context-specific independence mixture modeling for positional weight matrices. Bioinformatics 22(14), 166–173 (2006)
Georgi, B., Spence, M.A., Flodman, P., Schliep, A.: Mixture model based group inference in fused genotype and phenotype data. In: Studies in Classification, Data Analysis, and Knowledge Organization, Springer, Heidelberg (2007)
Hanks, S.K., Quinn, A.M., Hunter, T.: The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science 241(4861), 42–52 (1988)
Hannenhalli, S., Russell, R.B.: Analysis and prediction of functional sub-types from protein sequence alignments. J. Mol. Biol. 303(1), 61–76 (2000)
Hunter, T.: Protein kinase classification. Methods Enzymol 200, 3–37 (1991)
Jones, S., Thornton, J.M.: Searching for functional sites in protein structures. Curr. Opin. Chem. Biol. 8(1), 3–7 (2004)
Lazareva-Ulitsky, B., Diemer, K., Thomas, P.D.: On the quality of tree-based protein classification. Bioinformatics 21(9), 1876–1890 (2005)
Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257(2), 342–358 (1996)
Liu, Y., Ruoho, A.E., Rao, V.D., Hurley, J.H.: Catalytic mechanism of the adenylyl and guanylyl cyclases: modeling and mutational analysis. Proc. Natl. Acad. Sci. USA 94(25), 13414–13419 (1997)
Livingstone, C.D., Barton, G.J.: Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput. Appl. Biosci. 9(6), 745–756 (1993)
Manning, G., Whyte, D.B., Martinez, R., Hunter, T., Sudarsanam, S.: The protein kinase complement of the human genome. Science 298(5600), 1912–1934 (2002)
McLachlan, G.J., Peel, D.: Finite Mixture Models. John Wiley & Sons, Chichester (2000)
Mirny, L.A., Gelfand, M.S.: Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors. J. Mol. Biol. 321(1), 7–20 (2002)
Pei, J., Cai, W., Kinch, L.N., Grishin, N.V.: Prediction of functional specificity determinants from protein sequences using log-likelihood ratios. Bioinformatics 22(2), 164–171 (2006)
Sjolander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I.S, Haussler, D.: Dirichlet mixtures: A method for improving detection of weak but significant protein sequence homology. Technical report, University of California at Santa Cruz, Santa Cruz, CA, USA (1996)
Smith, C.M., Shindyalov, I.N., Veretnik, S., Gribskov, M., Taylor, S.S., Ten Eyck, L.F., Bourne, P.E.: The protein kinase resource. Trends Biochem. Sci. 22(11), 444–446 (1997)
Wicker, N., Perrin, G.R., Thierry, J.C., Poch, O.: Secator: a program for inferring protein subfamilies from phylogenetic trees. Mol. Biol. Evol. 18(8), 1435–1441 (2001)
Yu, G., Park, B.-H., Chandramohan, P., Munavalli, R., Geist, A., Samatova, N.F.: In silico discovery of enzyme-substrate specificity-determining residue clusters. J. Mol. Biol. 352(5), 1105–1117 (2005)
Zhang, G., Liu, Y., Ruoho, A.E., Hurley, J.H.: Structure of the adenylyl cyclase catalytic core. Nature 386(6622), 247–253 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Georgi, B., Schultz, J., Schliep, A. (2007). Context-Specific Independence Mixture Modelling for Protein Families. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds) Knowledge Discovery in Databases: PKDD 2007. PKDD 2007. Lecture Notes in Computer Science(), vol 4702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74976-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-74976-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74975-2
Online ISBN: 978-3-540-74976-9
eBook Packages: Computer ScienceComputer Science (R0)