Dividing protein interaction networks for modular network comparative analysis☆
Introduction
With the exponential increase of data on protein interactions obtained from advanced technologies, data on thousands of interactions in human and most model species have become available (e.g., Bader et al., 2001, Xenarios et al., 2002). PPI networks offer a powerful representation for better understanding modular organization of cells, for predicting biological functions and for providing insight into a variety of biochemical processes.
Recent studies consider a comparative approach for the analysis of PPI networks from different species in order to discover common protein groups, called conserved complexes, which are likely to be related and to share similar functionality in a cell (Sharan and Ideker, 2006, Srinivasan et al., 2007). This problem is known as protein network alignment. Algorithms for this task typically model this problem by means of a merged graph representation of the networks to be compared, called alignment (or orthology) graph, and then formalize the problem of searching (merged) conserved complexes in the alignment graph as an optimization problem. Due to the computational intractability of the resulting optimization problem, greedy algorithms are commonly used.
One can identify two main network alignment categories. Local network alignment, that identifies the best local mapping for each local region of similarity between input networks, and global network alignment, that searches for the best single mapping across all parts of the input networks, even if it is locally sub-optimal in some regions of the networks. If a method aligns networks of just two species, it is called pairwise network alignment, while if it can handle more than two networks, it is called multiple network alignment.
Many methods for network alignment have been proposed. We describe them briefly in the next section on related work.
The aim of this paper is not to propose yet another network alignment algorithm, but to show how PPI networks can be divided, prior to their alignment, into small sub-graphs that are likely to cover conserved complexes.
Conserved complexes discovered by computational techniques have in general small size (that is, number of proteins) compared to the size of the PPI network they belong to. Moreover, PPI networks are known to have a scale-free topology where most proteins participate in a small number of interactions while a few proteins, called hubs, contain a high number of interactions. As indicated by a recent study, hubs whose removal disconnects a PPI network (articulation hubs) are likely to appear in conserved interaction patterns (Pržulj, 2005).
These observations motivate the introduction of an algorithm for dividing PPI networks, called Divide, that combines biological (orthology) and graph theoretical (articulation) information: it detects small groups of ortholog articulations, called centers, which are then expanded into subsets of ortholog nodes. This algorithm has the desirable property of being parameterless.
The effectiveness and robustness of Divide is assessed experimentally in the following three ways.
First, we show that the sub-graphs generated by Divide indeed cover “true” conserved protein complexes. This is done by measuring the overlap of these sub-graphs with MIPS curated functional complexes restricted to those proteins belonging to an orthologous pair.
Next, we show that the generated sub-graphs cover protein complexes computationally predicted. Specifically, we compare these sub-graphs with the conserved complexes predicted by one state-of-the-art pairwise local alignment algorithm, called MaWish (Koyutürk et al., 2006b). We investigate experimentally how Divide biases the search process of MaWish, and whether the generated sub-graphs contain information to be used for discovering new conserved complexes. Results of an extensive experimental analysis indicate that indeed Divide generates sub-graphs containing conserved complexes that are not detected by MaWish.
Finally, we consider two case studies of modular network alignment. In the first case study, Divide is used to generate sub-graphs, which are then pairwise merged using the networks merging model of MaWish. We apply iterative exact search to the resulting alignment graphs. Results of experiments show ability to detect a high number of accurate conserved complexes. In the second case study, Divide is used for enhancing an existing method for discovering conserved functional complexes, called MNAligner (Li et al., 2007). MNAligner consists of two main steps: first, candidate functional complexes within one species are detected using a clustering algorithm (MCODE); next, an exact optimization algorithm is applied for matching the resulting candidate functional complexes with sub-graphs of the other species in order to extract conserved complexes. Results of experiments show that by applying Divide to orthologs nodes prior to clustering enhances the performance of this algorithm.
To the best of our knowledge, we propose the first algorithm which directly tackles the modularity issue in network alignment by showing that Divide generates sub-graphs that cover conserved complexes and can be used for performing modular pairwise network alignment.
In general, these results substantiate the important role of the notions of orthology and articulation in modular comparative PPI network analysis.
This paper contains and extends material from two previous conferences’ papers of Jancura et al., 2008a, Jancura et al., 2008b. It is organized as follows. In the next section we discuss related works. Section 3 describes the graph-theoretic terminology used in the paper. The Divide algorithm is introduced in Section 4. Section 5 summarizes the data and the type of assessment employed in the experimental analysis. In Section 6 the robustness of Divide is assessed by analysing how the generated sub-graphs cover “true” complexes. In Section 7 the sub-graphs generated by Divide are compared with the complexes predicted by MaWish. In Section 8 modular network alignment is performed on the two case studies above described. Finally, we conclude and briefly address future work in Section 9.
Section snippets
Related work
Recent overviews of approaches and issues in comparative biological networks analysis have been presented by Sharan and Ideker, 2006, Srinivasan et al., 2007 since the first formulation of network alignment introduced by Kelley et al. (2003).
In general, network alignment methods have been proposed for discovering conserved metabolic pathways, conserved functional complexes, and for detecting functional orthologs. For instance, Kelley et al. (2003) introduced an approach for detecting conserved
Graph theoretic background
Given a graph G = (U,E), nodes joined by an edge are called adjacent. A neighbor of a node u is a node adjacent to u. The degree of u is the number of elements in E containing the vertex u.
A graph G = (U,E) is called undirected if uu′ in E implies u′u also in E; otherwise G is called directed. A directed acyclic graph is a directed graph that contains no cycles.
A sub-graph H(V,F) of an undirected graph G(U,E) is said to be induced by the set of nodes V ⊂ U if and only if the set of edges F ⊂ E
Divide algorithm
Suppose given the PPI networks G and G1 of two species. Let G(U,E) and O ⊆ U be the set of vertices which are orthologous w.r.t. the vertices of G1. Suppose O contains n elements. The Divide algorithm is shown in pseudo-code in Algorithm 1. It generates centers from orthologous articulations and expands them into centered sub-trees containing only orthologous proteins. The main steps of Divide are described in detail below.
Computing articulations (Line 1). Computation of articulations can be
Experimental analysis
The effectiveness and robustness of the proposed pre-processing method is assessed experimentally in the following three ways.
First, we show that the sub-graphs generated by Divide indeed cover “true” conserved protein complexes. This is done by measuring the overlap of the generated sub-graphs with yeast MIPS curated functional complexes restricted to those proteins belonging to an orthologous pair.
Next, we show that the resulting sub-graphs cover protein complexes computationally predicted by
Divide generates sub-graphs covering “true” protein conserved complexes
Let Divide sub-graphs denote the sub-graphs generated by Divide. We compared Divide sub-graphs with “true” protein conserved complexes. To this aim, we evaluated the quality of sub-graphs generated by Divide using known yeast complexes catalogued in the MIPS database2 (Güldener et al., 2005). Category 550, which was obtained from high throughput experiments, is excluded and we retained only manually annotated complexes up to depth 3 in the
Comparison of Divide sub-graphs with predicted conserved complexes
Here we investigate how Divide constrains the search process of MaWish, and whether the sub-graphs generated by Divide cover those produced by MaWish. To this end, we used the conserved complexes predicted by this alignment method and processed by the clique-rule merging procedure, where complexes consisting of one or two proteins were filtered out. We call the resulting sets MaWish complexes.
In the right column of Fig. 4 one can observe that a number of yeast MaWish complexes are fully covered
Applications of modular network alignment
In this section, we investigate the ability of Divide to enhance the performance of alignment methods.
Specifically, we apply Divide to two different alignment methods.
In the first case, we consider an instance of modular local network alignment, called DivAfull (Jancura et al., 2008a). DivAfull employs Divide to generate sub-graphs, the MaWish alignment model to align them, and iterative exact search to detect all possible solutions from the generated alignments. Therefore, application of Divide
Conclusion
This paper introduced a heuristic algorithm, Divide, for dividing protein interaction networks in such a way that conserved functional complexes are covered by generated sub-graphs. To the best of our knowledge, this is the first algorithm for this task, which can be used to perform modular network alignment of protein interaction networks (Jancura et al., 2008a, Jancura et al., 2008b).
The selection of centers is biased on the orthology information but it can be changed for another property.
Acknowledgments
The authors thank Mehmet Koyutürk for providing the MaWish code.
References (64)
- et al.
Functionally guided alignment of protein interaction networks for module detection
Bioinformatics
(2009) - et al.
Gene ontology: tool for the unification of biology
Nat. Genet.
(2000) - et al.
An automated method for finding molecular complexes in large protein interaction networks
BMC Bioinform.
(2003) - et al.
Bind – the biomolecular interaction network database
Nucleic Acid Res.
(2001) - et al.
Systematic identification of functional orthologs based on protein network comparison
Genome Res.
(2006) - et al.
Cross-species analysis of biological networks by Bayesian alignment
Proc. Natl. Acad. Sci.
(2006) - Blin, G., Sikora, F., Vialette, S., 2009. Querying protein–protein interaction networks. In: ISBRA ’09: Proceedings of...
- et al.
Algorithm 457: finding all cliques of an undirected graph
Commun. ACM
(1973) - et al.
Torque: topology-free querying of protein interaction networks
Nucleic Acid Res.
(2009) - Cheng, Q., Berman, P., Harrison, R., Zelikovsky, A., 2008. Fast alignments of metabolic networks. In: BIBM ’08:...
Local optimization for global alignment of protein interaction networks
Pacific Symp. Biocomput.
Prediction of protein function using proteinprotein interaction data
J. Comput. Biol.
Qnet: a tool for querying protein interaction networks
J. Comput. Biol.
Identification of functional modules from conserved ancestral protein protein interactions
Bioinformatics
What properties characterize the hub proteins of the protein–protein interaction network of Saccharomyces cerevisiae?
Genome Biol.
Graemlin: general and robust alignment of multiple large interaction networks
Genome Res.
Automatic parameter learning for multiple local network alignment
J. Comput. Biol.
CYGD: the comprehensive yeast genome database
Nucleic Acid Res.
Domain-oriented edge-based alignment of protein interaction networks
Bioinformatics
Computational cluster validation in post-genomic data analysis
Bioinformatics
Identification of conserved protein complexes based on a model of protein network evolution
Bioinformatics
Algorithm 447: efficient algorithms for graph manipulation
Commun. ACM
Divide, align and full-search for discovering conserved protein complexes
Lethality and centrality in protein networks
Nature
Fast and accurate alignment of multiple protein networks
J. Comput. Biol.
Conserved pathways within bacteria and yeast as revealed by global protein network alignment
Proc. Natl. Acad. Sci.
A new graph-based method for pairwise global network alignment
BMC Bioinform.
Pairwise local alignment of protein interaction networks guided by models of evolution
Detecting conserved interaction patterns in biological networks
Journal of Computational Biology
Pairwise alignment of protein interaction networks
J. Comput. Biol.
Cited by (9)
Cross-domain network representations
2019, Pattern RecognitionCitation Excerpt :Techniques not only analysis networks but also learn knowledge from network structures which has become a main stream in network research for artificial intelligence purposes [1,2]. To this end, networks are preliminarily categorized based on real-world systems and their physical properties, such as social network [3,4], biological network [5] and citation network [6]. As shown in Fig. 1, social networks (a) denote users as nodes and friendship as edges; biological networks such as the Protein-Protein Interactions (PPI) network (b) models proteins as nodes and PPI as edges; and citation networks (c) represent papers as nodes and citations as edges.
Dense subgraph mining with a mixed graph model
2013, Pattern Recognition LettersCitation Excerpt :These methods are applied even if all the vertices are needed to be clustered. Dense subgraphs are considered as seeds of clusters, and the remaining vertices are clustered based on their similarities to the cluster seeds (Du et al., 2008; Jancura and Marchiori, 2010). However, for bipartite graphs it is also proven that for a wide range of edge weights even finding good approximations of the maximum weight biclique in polynomial time is impossible (Tan, 2008).
Cross-domain Network Representations
2019, arXivDetecting conserved protein complexes using a dividing-and-matching algorithm and unequally lenient criteria for network comparison
2015, Algorithms for Molecular BiologyA network comparison method for finding the conservative interaction regions in protein interaction network
2013, Journal of Computational and Theoretical NanoscienceA dividing-and-matching algorithm to detect conserved protein complexes via local network alignment
2013, Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
- ☆
This manuscript is an extended version of the conference paper Jancura et al. (2008b) presented at the Third IAPR International Conference on Pattern Recognition in Bioinformatics, Melbourne, Australia, 2008.