ABSTRACT
In this paper we present a cohesive structural itemset miner aiming to discover interesting patterns in a set of data objects within a multidimensional spatial structure by combining the cohesion and the support of the pattern. The usefulness of this algorithm is demonstrated by applying it to find interesting patterns of amino acids in spatial proximity within a set of proteins based on their atomic coordinates in the protein molecular structure. The experiments show that several patterns found by the cohesive structural itemset miner contain amino acids that frequently co-occur in the spatial structure, even if they are distant in the primary protein sequence and only brought together by protein folding. Further various indications were found that some of the discovered patterns seem to represent common underlying support structures within the proteins.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB'94, pages 487--499. Morgan Kaufmann Publishers, 1994. Google ScholarDigital Library
- A. Andreeva, D. Howorth, J.-M. Chandonia, S. E. Brenner, T. J. P. Hubbard, C. Chothia, and A. G. Murzin. Data growth and its impact on the SCOP database: new developments. Nucleic acids research, 36(Database issue):D419--25, Jan. 2008.Google Scholar
- D. N. Arvidson, F. Lu, C. Faber, H. Zalkin, and R. G. Brennan. The structure of PurR mutant L54M shows an alternative route to DNA kinking. Nature structural biology, 5(6):436--41, June 1998.Google Scholar
- M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics, 25(1):25--9, May 2000.Google ScholarCross Ref
- C. E. Bell, P. Frescura, A. Hochschild, and M. Lewis. Crystal structure of the lambda repressor C-terminal domain provides a model for cooperative operator binding. Cell, 101(7):801--11, June 2000.Google ScholarCross Ref
- B. Cule, B. Goethals, and C. Robardet. A new constraint for mining sets in sequences. In SDM'09, pages 317--328, 2009.Google Scholar
- K. S. Gajiwala and S. K. Burley. Winged helix proteins. Current Opinion in Structural Biology, 10(1):110--116, Feb. 2000.Google ScholarCross Ref
- B. Gärtner. Fast and robust smallest enclosing balls. In Algorithms-ESA'99, pages 325--338. Springer, 1999. Google ScholarDigital Library
- S. C. Graham, P. E. Lilley, M. Lee, P. M. Schaeffer, A. V. Kralicek, N. E. Dixon, and J. M. Guss. Kinetic and crystallographic analysis of mutant Escherichia coli aminopeptidase P: insights into substrate recognition and the mechanism of catalysis. Biochemistry, 45(3):964--75, Jan. 2006.Google ScholarCross Ref
- J. Hu, X. Shen, Y. Shao, C. Bystroff, and M. J. Zaki. Mining protein contact maps. In 2nd BIOKDD workshop on data mining in bioinformatics., 2002.Google Scholar
- C. G. Kalodimos, R. Boelens, and R. Kaptein. Toward an integrated model of protein-DNA recognition as inferred from NMR studies on the Lac repressor system. Chemical reviews, 104(8):3567--86, Aug. 2004.Google ScholarCross Ref
- A. Kouranov, L. Xie, J. de la Cruz, L. Chen, J. Westbrook, P. E. Bourne, and H. M. Berman. The RCSB PDB information portal for structural genomics. Nucleic acids research, 34(Database issue):D302--5, Jan. 2006.Google Scholar
- W. T. Lowther and B. W. Matthews. Metal-loaminopeptidases: Common Functional Themes in Disparate Structural Surroundings. Chemical Reviews, 102(12):4581--4608, Dec. 2002.Google Scholar
- P. Meysman, K. Marchal, and K. Engelen. Identifying common structural DNA properties in transcription factor binding site sets of the LacI-GalR family. Current bioinformatics, 8(4), 2013.Google Scholar
- A. Nakamura, C. Wada, and K. Miki. Structural basis for regulation of bifunctional roles in replication initiator protein. Proceedings of the National Academy of Sciences of the United States of America, 104(47):18484--9, Nov. 2007.Google ScholarCross Ref
- A. Reményi, M. C. Good, R. P. Bhattacharyya, and W. A. Lim. The role of docking interactions in mediating signaling input, output, and discrimination in the yeast MAPK network. Molecular cell, 20(6):951--62, Dec. 2005.Google ScholarCross Ref
- E. D. Scheeff and P. E. Bourne. Structural evolution of the protein kinase-like superfamily. PLoS computational biology, 1(5):e49, Oct. 2005.Google Scholar
- H. Sharma, S. Yu, J. Kong, J. Wang, and T. A. Steitz. Structure of apo-CAP reveals that large conformational changes are necessary for DNA binding. Proceedings of the National Academy of Sciences of the United States of America, 106(39):16604--9, Sept. 2009.Google ScholarCross Ref
- M. Vendruscolo, E. Kussell, and E. Domany. Recovery of protein structure from contact maps. Folding and Design, 2(5):295--306, Oct. 1997.Google ScholarCross Ref
Recommendations
Discovery of spatially cohesive itemsets in three-dimensional protein structures
In this paper we present a cohesive structural itemset miner aiming to discover interesting patterns in a set of data objects within a multidimensional spatial structure by combining the cohesion and the support of the pattern. We propose two ways to ...
Mining frequent patterns in protein structures: a study of protease families
Motivation: Analysis of protein sequence and structure databases usually reveal frequent patterns (FP) associated with biological function. Data mining techniques generally consider the physicochemical and structural properties of amino acids and their ...
An Analytical Study of NP-Hard Protein Folding Problems
ICICA '14: Proceedings of the 2014 International Conference on Intelligent Computing ApplicationsProtein folding problem is a Non deterministic Polynomial hard problem. Proteins are of different types, whereas each protein plays an important role in the living cells. Every protein has a unique structure and function. The unique structure is formed ...
Comments