Protein motifs retrieval by SS terns occurrences
Highlights
► Protein structure understanding is central to predict protein function and evolution. ► We describe a new approach to analyze protein in 3D by Secondary Structures (SSs). ► We use the G-Hough Transform considering SS triplets as structural primitive elements. ► The goal is the retrieval of structural blocks (motifs) composed by three to five SSs. ► Over 7.5 million cases show valuable performances for precision and computation time.
Introduction
Many evolutionarily and functionally meaningful links between proteins come to light through the analysis of their spatial 3D structures. Protein structure and morphology are significant to understand and predict their functionality (Shuoyong et al., 2007). Protein structure comparison is an important issue that helps biologists to understand various aspects of protein function and evolution. For this reason protein comparison and retrieval are basic issues that helps biologists to comprehend various aspects of the phylogenetic evaluation and of the tasks performed i.e. proteins role in the machinery of life.
The protein 3D structure is vitally important in many biological applications, such as rational drug design. The retrieval of a protein 3D structure can be achieved by different experimental and bioinformatics methods. To this aim, X-ray crystallography is a powerful tool although time-consuming, expensive, and not feasible for all proteins (e.g. so far very few membrane protein structures have been determined). Nuclear magnetic resonance (NMR) is another tool that can be employed to determine the 3D structures of membrane proteins, even though time-consuming and costly. In order to acquire the structural information in a timely manner, it is possible to adopt various bioinformatics tools (see, e.g. (Li et al., 2011, Ma et al., 2012, Wang and Chou, 2011, Chou et al., 1997, Wang and Chou, 2012) and a review Chou, 2005). The present study is devoted to develop a novel method to search a database of protein structures for 3D patterns of secondary structural elements.
Structural comparison and protein structure retrieval problems have been studied in the structural biology community. In most cases just representing the set of the protein by a set of SS elements. Can and Wang (2003) present a new method for conducting protein structure similarity searches and applies differential geometry knowledge on their 3D structure for extracting “signatures” such as curvature, torsion and SS type. Camoglu et al. (2003), to find similarities in protein database, build an indexing structure based on SS elements triplets by using R-tree. Chionh et al. (2003) propose the SCALE algorithm to compare protein 3D structures through matrices that utilizes angles and distances between SS elements. Krissinel and Henrick (2004) describe the Secondary Structure Matching (SSM) algorithm for comparison in 3D, including an original procedure for matching graphs built on the protein’s SS elements, that is followed by an iterative 3D alignment of protein backbone atoms. Chi et al. (2004) design a fast system for protein structural block retrieval by using image based distance matrices and multidimensional indices. The 1D string representation of local protein structure retains a degree of structural information. This type of representation can be a powerful tool for comparison and classification. Friedberg et al. (2006) described the use of a particular structure fragment library, denoted as KL-strings, for the 1D representation of protein structure and developed an infrastructure for comparing structures with 1D representation. Shuoyong et al. (2007) developed a program, ProSMoS (Protein Structure Motif Search) to find fold-level structural similarities and to search for the presence of structural motifs. This package searches a library of protein structures for user defined 3D patterns of SS elements. Also a web server to make a pattern-based search, using interaction matrix representation of protein structures (Shuoyong et al. (2009)), has been developed. Albrecht et al. (2008) propose a different approach and apply data reduction techniques directly to the protein structure and convert 3D data into 2D so accelerating the structural comparisons. Zotenko et al. (2007) propose an approach to speed up protein comparison by mapping a protein structure to a high-dimensional vector and approximating structural similarity by suitable distances between the corresponding vectors. Zhang et al. (2009) by a transition probability matrix and some structural characteristic vectors of proteins developed FDOD (Function of Degree of Disagreement) a score scheme to measure the protein similarity. Nguyen and Madhusudhan (2011) propose a new algorithm, CLICK, to capture such similarities. This method optimally superimposes a pair of protein structures independently of their topology and can generally be applied to compare any pair of molecular structures represented in Cartesian coordinates as exemplified by the RNA structure superimposition benchmark. Cantoni and Mattia, 2012, Cantoni et al., 2012 made a study for retrieving structural motifs by using GHT and range tree. This approach is completely new, because the analysis is based on the 3D spatial distribution of the SS.
In this paper, a new approach for structural block retrieval based on protein SS comparison is proposed. Here, triangles joining the middle points of the SS triplets are considered as “structural elements” and all the block triangles are compared with all the macromolecule triangles. The focus of the paper is on the retrieval of an existing structural block completely and precisely known. The block can be defined without constraints such as adjacency, distance limits, homogeneity, etc. The only constraints is that the SS components exist in the protein macromolecule.
The rest of the paper is organized as following. Section II introduces the GHT and the triangle approaches. Section III represents the experiments and their results. In the final session IV a brief discussion and the future works are described.
Section snippets
Methodology
In this paper a novel approach, GHT-based, for motif retrieval is proposed. The GHT is used for comparison and search of structural similarity between a given structural block (a motif or a domain or the entire protein) and the proteins of a database like the PDB. Note that, if the searched structure is just a component of a protein (like a structural motif or a domain) the same algorithm supports the detection and the statistical distribution of these components.The primitive patterns to which
Experiments and performances
The aim of this experiment is to test precision and computation time of the proposed method.
In order to assess the statistical performances the following three cross-validation methods are often used: independent dataset test, subsampling (or K-fold cross validation) test, and jackknife test (Chou and Zhang, 1995). In particular, the jackknife test is considered less arbitrary in that it always produces a unique result for a given dataset. The rationale is: (i) for the independent dataset test,
Conclusions
Comparing protein structures and retrieving motif remain an active area of development in structural biology. The new approach refers to the structural analysis of the 3D distribution of SSs. In this paper the problem of combining SS triplets for searching general motifs (details are given for the cases of three, four and five SSs), in protein structure datasets is considered. The comparison is conducted, by considering triangles as primitives (or, as basic structural elements) using motif and
References (32)
- et al.
Classification of proteins based on similarity of two-dimensional protein maps
Bioph. Chem.
(2008) Some remarks on protein attribute prediction and pseudo amino acid composition
J. Theor. Biol.
(2011)- et al.
Prediction of the tertiary structure and substrate binding site of caspase-8
FEBS Lett.
(1997) - et al.
Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses
J. Theor. Biol.
(2010) Multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization
J. Theor. Biol.
(2012)- et al.
Use of information discrepancy measure to compare protein secondary structures
J. Mol. Struct.: THEOCHEM
(2009) - et al.
PSI: Indexing protein structures for fast similarity search
Bioinformatics
(2003) - Can, T., Wang, Y.F., 2003. CTSS: A robust and efficient method for protein structure alignment based on local...
- et al.
Protein structure analysis through Hough transform and range tree. New tools and methods for pattern recognition in complex biological systems
Nuovo Cimento C
(2012) - et al.
Protein motif retrieval through secondary structure spatial co-occurrences. New tools and methods for pattern recognition in complex biological systems
Nuovo Cimento C
(2012)
Dual-layer wavelet svm for predicting protein structural class via the general form of Chou’s pseudo amino acid composition
Protein Pept. Lett.
A fast protein structure retrieval system using image based distance matrices and multidimensional index
Internat. J. Software Eng. Knowl. Eng.
Prediction of protein cellular attributes using pseudo amino acid composition
Proteins: Struct. Funct. Genetics
Coupling interaction between thromboxane A2 receptor and alpha-13 subunit of guanine nucleotide-binding protein
J. Proteome Res.
Review: Recent advances in developing web-servers for predicting protein attributes
Nat. Sci.
Cited by (9)
Pattern recognition and beyond: Alfredo Petrosino's scientific results
2020, Pattern Recognition LettersCitation Excerpt :In [13] and [15], a novel 3D structural representations of the proteins is proposed and adopted to exploit the learning capabilities of unsupervised (SOM) and supervised (G-NN) techniques, respectively. For searching protein structural similarities in databases, in [14], protein motifs retrieval is tackled based on the use of the generalized 3D Hough transform. Recent research finds its application in the design of biometric systems.
World Competitive Contests (WCC) algorithm: A novel intelligent optimization algorithm for biological and non-biological problems
2016, Informatics in Medicine UnlockedCitation Excerpt :Some of the other related works that we can refer to are as follows: Specifications of the monocyte activating motif in the mycobacterium (mycobacterial) tuberculosis [17], motifs retrieval by the Secondary Structure terns occurrences [18], and the interaction of binding motif within the nucleocapsid protein of porcine reproductive and respiratory syndrome virus and the host cellular signaling proteins [19]. Proposed optimization algorithm starts with the first population of teams [20].
Geometrical motifs search in proteins: A parallel approach
2015, Parallel ComputingCitation Excerpt :The computational complexity of the GHT can be quite relevant, since it depends on the number elements that make up the model, (that is, the cardinality of the reference table), on the number of feature elements present in the feature space to be analyzed, and on the resolution at which the voting space is quantized. The Secondary Structures Co-occurrences (SSC) [2,3] and the Secondary Structures Triplets (SST) [14] are two algorithms, based on the GHT, which search for geometrical motifs (patterns) of SSEs (feature elements) inside a given protein (search space). The SSC algorithm uses pairs (co-occurrences) of SSEs, while SST uses terns.
A method of protein model classification and retrieval using bag-of-visual-features
2014, Computational and Mathematical Methods in MedicineMotifs and structural blocks retrieval by GHT
2014, European Physical Journal PlusCCMS: A greedy approach to motif extraction
2013, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)