Abstract
In biological sequence analysis many DNA and RNA sequences discovered in laboratory experiments are not properly identified. Here the focus is on using clustering algorithms to provide a structure to the data. The approach is inter-disciplinary using domain knowledge to identify such sequences. The enormous volume and high dimensionality of unidentified biological sequence data presents a challenge. Nonetheless useful and interesting results have been obtained, both directly and indirectly, by applying clustering to the data.
Work supported by UK EPSRC MRes studentship and ESPRIT HPCN Project No 22693.
Chapter PDF
Similar content being viewed by others
Keywords
- Reference Code
- Weak Cluster
- Biological Sequence Analysis
- Minimum Message Length
- EMBL Nucleotide Sequence Database
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Australian Biotechnology Association (ABA), What is genetic engineering, Educational leaflet, http://www.aba.asn.au/leaf2.html, 1996.
M. Bland, An Introduction to Medical Statistics, Oxford Medical Publications, 1994.
P. Cheeseman and J. Stutz, Bayesian Classification (AutoClass): Theory and Results, Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), AAAI Press, pp. 153–181, 1995.
M.J. Currie and Q.A. Parker, Clustan — A Cluster-Analysis Package, Science and Engineering Research Council, Rutherford Appleton Laboratory, Starlink Project, User Note 26.6, 1993.
EMBL Nucleotide Sequence Database: http://www.ebi.ac.uk, 1997.
K.H. Fasman, A.J. Cuticchia and D.T. Kingsbury, The GDB (TM) Human Genome Database Anno, Nucl. Acid. R. 22 (17), pp. 3462–3469, 1994.
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Eds.) Advances in Knowledge Discovery and Data Mining, AAAI Press, 1995.
D. Jacobson, Mapping and sequencing the human genome, http:/www.gdb.org/Dan/DOE/prim2.html, 1995.
I.T. Jolliffe, Principle Component Analysis, Springer Series in Statistics, 1986.
C.S. Wallace and D.L. Dowe, Intrinsic classification by MML — the Snob program, Proc. 7th Australian Joint Conference on Artificial Intelligence World Scientific, pp. 37–44, 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Manning, A.M., Keane, J.A., Brass, A., Goble, C.A. (1997). Clustering techniques in biological sequence analysis. In: Komorowski, J., Zytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1997. Lecture Notes in Computer Science, vol 1263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63223-9_130
Download citation
DOI: https://doi.org/10.1007/3-540-63223-9_130
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63223-8
Online ISBN: 978-3-540-69236-2
eBook Packages: Springer Book Archive