Abstract
A novel data mining approach for similarity search and knowledge discovery in protein structure databases is proposed. PADS (Protein structure Alignment by Directional shape Signatures) incorporates the three dimensional coordinates of the main atoms of each amino acid and extracts a geometrical shape signature along with the direction of each amino acid. As a result, each protein structure is presented by a series of multidimensional feature vectors representing local geometry, shape, direction, and biological properties of its amino acid molecules. Furthermore, a distance matrix is calculated and is incorporated into a local alignment dynamic programming algorithm to find the similar portions of two given protein structures followed by a sequence alignment step for more efficient filtration. The optimal superimposition of the detected similar regions is used to assess the quality of the results. The proposed algorithm is fast and accurate and hence could be used for analysis and knowledge discovery in large protein structures. The method has been compared with the results from CE, DALI, and CTSS using a representative sample of PDB structures. Several new structures not detected by other methods are detected.
This research was supported by the NSF grants under CNF-04-23336, IIS02-23022, IIS02-09112, and EIA00-80134.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Protein data bank(pdb) (2004), http://www.rcsb.org/pdb/holdings.html
Aghili, S.A., Agrawal, D., Abbadi, A.E.: Pads: Protein structure alignment using directional shape signatures. Technical Report 2004-12, UCSB (May 2004)
Aghili, S.A., Agrawal, D., Abbadi, A.E.: Similarity search of protein structures using geometrical features. In: Proceedings of Thirteenth Conference on Information and Knowledge Management (CIKM), pp. 148–149 (2004)
Bradley, P., Kim, P., Berger, B.: Trilogy: Discovery of sequence-structure patterns across diverse proteins. Proc. Natl. Academy of Science 99(13), 8500–8505 (2002)
Can, T., Wang, Y.: Ctss: A robust and efficient method for protein structure alignment based on local geometrical and biological features. In: IEEE Computer Society Bioinformatics Conf., pp. 169–179 (2003)
Çamoğlu, O., Kahveci, T., Singh, A.: Towards index-based similarity search for protein structure databases. In: IEEE Computer Society Bioinformatics Conf., pp. 148–158 (2003)
Dayhoff, M., Schwartz, R.: Atlas of protein sequence and structure. Nat. Biomed. Res. Found (1978)
Gibrat, J., Madej, T., Bryant, S.: Surprising similarities in structure comparison. Current Opinion Structure Biology 6(3), 377–385 (1996)
Godzik, A.: The structural alignment between two proteins: is there a unique answer? Protein Sci. 5, 1325–1338 (1996)
Higgins, D., Taylor, W.: Bioinformatics: Sequence, Structure and Databanks. Oxford University Press, Oxford (2000)
Hobohm, U., Scharf, M., Schneider, R.: Selection of representative protein data sets. Protein Science 1, 409–417 (1993)
Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. J. Molecular Biology 233(1), 123–138 (1993)
Holm, L., Sander, C.: 3-d lookup: Fast protein database structure searches at 90% reliability. In: ISMB, pp. 179–185 (1995)
Lua, G.: Top: a new method for protein structure comparisons and similarity searches. J. Applied Crystallography 33(1), 176–183 (2000)
Madej, T., Gibrat, J., Bryant, S.: Threading a database of protein cores. Proteins 23, 356–369 (1995)
Needleman, S., Wunsch, C.: General method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molecular Biology 48, 443–453 (1970)
Pennec, X., Ayache, N.: A geometric algorithm to find small but highly similar 3d substructures in proteins. Bioinformatics 14(6), 516–522 (1998)
Shindyalov, I., Bourne, P.: Protein structure alignment by incremental combinatorial extension (ce) of the optimal path. Protein Engineering 11(9), 739–747 (1998)
Singh, A., Brutlag, D.: Hierarchical protein structure superposition using both secondary structure and atomic representations. In: Proc. Int. Conf. Intelligent System Mol. Bio., pp. 284–293 (1997)
Smith, R., Waterman, M.: Identification of common molecular subsequences. J. Mol. Bio. 147(1), 195–197 (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aghili, S.A., Agrawal, D., El Abbadi, A. (2005). PADS: Protein Structure Alignment Using Directional Shape Signatures. In: Zhou, L., Ooi, B.C., Meng, X. (eds) Database Systems for Advanced Applications. DASFAA 2005. Lecture Notes in Computer Science, vol 3453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11408079_5
Download citation
DOI: https://doi.org/10.1007/11408079_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25334-1
Online ISBN: 978-3-540-32005-0
eBook Packages: Computer ScienceComputer Science (R0)