Abstract
In this paper we examine the issues involved in finding consensus patterns from biosequence data of very small sample sizes, by searching for so-called minimal multiple generalization (mmg), that is, a set of syntactically minimal patterns that accounts for all the samples. The data we use are the sigma regulons with more conserved consensus patterns for the bacteria B. subtilis. By comparing between the mmgs found over different search spaces, we found that it is possible to derive patterns close to the known consensus patterns by simply making some reasonable requirements on the kinds of patterns to obtain. We also propose some simple measures to evaluate the patterns in an mmg.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arimura, H., Fujino, R., Shinohara, T., Arikawa, S.: Protein motif discovery from positive examples by Minimal Multiple Generalization over regular patterns. In: Proceedings of the Genome Informatics Workshop, pp. 39–48 (1994)
Arimura, H., Shinohara, T., Otsuki, S.: Finding minimal generalizations for unions of pattern languages and its application to inductive inference from positive data. In: Enjalbert, P., Mayr, E.W., Wagner, K.W. (eds.) STACS 1994. LNCS, vol. 775, pp. 649–660. Springer, Heidelberg (1994)
Brāzma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. J. Comp. Biol. 5(2), 277–304 (1998)
Helmann, J.D., Moran, C.P.: RNA Polymerase and Sigma Factors, ch 21, pp. 289–312. American Society Microbiology, Washington (2001)
Makita, Y., Nakao, M., Ogasawara, N., Nakai, K.: DBTBS: Database of transcriptional regulation in Bacillus Subtilis and its contribution to comparative genomics. Nucl. Acids Res. 32, 75–77 (2004)
Ng, Y.K., Ono, H., Shinohara, T.: Measuring over-generalization in the minimal multiple generalizations of biosequences. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 176–188. Springer, Heidelberg (2005)
Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14(1), 55–67 (1998)
Shinohara, T.: Polynomial time inference of extended regular pattern languages. In: Goto, E., Nakajima, R., Yonezawa, A., Nakata, I., Furukawa, K. (eds.) RIMS 1982. LNCS, vol. 147, pp. 115–127. Springer, Heidelberg (1983)
Sigrist, C.J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., Bucher, P.: PROSITE: A documented database using patterns and profiles as motif descriptors. Brief. Bioinform., 3, 265–274 (2002)
Takae, T., Kasai, T., Arimura, H., Shinohara, T.: Knowledge discovery in biosequences using sort regular patterns. In: Workshop on Applied Learning Theory (1998)
Yamaguchi, M., Shimozono, S., Shinohara, T.: Finding minimal multiple generalization over regular patterns with alphabet indexing. In: Proceedings of the Seventh Workshop on Genome Informatics, vol. 7, pp. 51–60. Universal Academy Press, Tokyo (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ng, Y.K., Shinohara, T. (2006). Finding Consensus Patterns in Very Scarce Biosequence Samples from Their Minimal Multiple Generalizations. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_63
Download citation
DOI: https://doi.org/10.1007/11731139_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)