Dividing protein interaction networks for modular network comparative analysis

doi:10.1016/j.patrec.2010.04.005

Pattern Recognition Letters

Volume 31, Issue 14, 15 October 2010, Pages 2083-2096

https://doi.org/10.1016/j.patrec.2010.04.005 Get rights and content

Abstract

The increasing growth of data on protein–protein interaction (PPI) networks has boosted research on their comparative analysis. In particular, recent studies proposed models and algorithms for performing network alignment, that is, the comparison of networks across species for discovering conserved functional complexes. In this paper, we present an algorithm for dividing PPI networks, prior to their alignment, into small sub-graphs that are likely to cover conserved complexes. This allows one to perform network alignment in a modular fashion, by acting on pairs of resulting small sub-graphs from different species. The proposed dividing algorithm combines a graph-theoretical property (articulation) with a biological one (orthology). Extensive experiments on various PPI networks are conducted in order to assess how well the sub-graphs generated by this dividing algorithm cover protein functional complexes and whether the proposed pre-processing step can be used for enhancing the performance of network alignment algorithms. Source code of the dividing algorithm is available upon request for academic use.

Introduction

With the exponential increase of data on protein interactions obtained from advanced technologies, data on thousands of interactions in human and most model species have become available (e.g., Bader et al., 2001, Xenarios et al., 2002). PPI networks offer a powerful representation for better understanding modular organization of cells, for predicting biological functions and for providing insight into a variety of biochemical processes.

Recent studies consider a comparative approach for the analysis of PPI networks from different species in order to discover common protein groups, called conserved complexes, which are likely to be related and to share similar functionality in a cell (Sharan and Ideker, 2006, Srinivasan et al., 2007). This problem is known as protein network alignment. Algorithms for this task typically model this problem by means of a merged graph representation of the networks to be compared, called alignment (or orthology) graph, and then formalize the problem of searching (merged) conserved complexes in the alignment graph as an optimization problem. Due to the computational intractability of the resulting optimization problem, greedy algorithms are commonly used.

One can identify two main network alignment categories. Local network alignment, that identifies the best local mapping for each local region of similarity between input networks, and global network alignment, that searches for the best single mapping across all parts of the input networks, even if it is locally sub-optimal in some regions of the networks. If a method aligns networks of just two species, it is called pairwise network alignment, while if it can handle more than two networks, it is called multiple network alignment.

Many methods for network alignment have been proposed. We describe them briefly in the next section on related work.

The aim of this paper is not to propose yet another network alignment algorithm, but to show how PPI networks can be divided, prior to their alignment, into small sub-graphs that are likely to cover conserved complexes.

Conserved complexes discovered by computational techniques have in general small size (that is, number of proteins) compared to the size of the PPI network they belong to. Moreover, PPI networks are known to have a scale-free topology where most proteins participate in a small number of interactions while a few proteins, called hubs, contain a high number of interactions. As indicated by a recent study, hubs whose removal disconnects a PPI network (articulation hubs) are likely to appear in conserved interaction patterns (Pržulj, 2005).

These observations motivate the introduction of an algorithm for dividing PPI networks, called Divide, that combines biological (orthology) and graph theoretical (articulation) information: it detects small groups of ortholog articulations, called centers, which are then expanded into subsets of ortholog nodes. This algorithm has the desirable property of being parameterless.

The effectiveness and robustness of Divide is assessed experimentally in the following three ways.

First, we show that the sub-graphs generated by Divide indeed cover “true” conserved protein complexes. This is done by measuring the overlap of these sub-graphs with MIPS curated functional complexes restricted to those proteins belonging to an orthologous pair.

Next, we show that the generated sub-graphs cover protein complexes computationally predicted. Specifically, we compare these sub-graphs with the conserved complexes predicted by one state-of-the-art pairwise local alignment algorithm, called MaWish (Koyutürk et al., 2006b). We investigate experimentally how Divide biases the search process of MaWish, and whether the generated sub-graphs contain information to be used for discovering new conserved complexes. Results of an extensive experimental analysis indicate that indeed Divide generates sub-graphs containing conserved complexes that are not detected by MaWish.

Finally, we consider two case studies of modular network alignment. In the first case study, Divide is used to generate sub-graphs, which are then pairwise merged using the networks merging model of MaWish. We apply iterative exact search to the resulting alignment graphs. Results of experiments show ability to detect a high number of accurate conserved complexes. In the second case study, Divide is used for enhancing an existing method for discovering conserved functional complexes, called MNAligner (Li et al., 2007). MNAligner consists of two main steps: first, candidate functional complexes within one species are detected using a clustering algorithm (MCODE); next, an exact optimization algorithm is applied for matching the resulting candidate functional complexes with sub-graphs of the other species in order to extract conserved complexes. Results of experiments show that by applying Divide to orthologs nodes prior to clustering enhances the performance of this algorithm.

To the best of our knowledge, we propose the first algorithm which directly tackles the modularity issue in network alignment by showing that Divide generates sub-graphs that cover conserved complexes and can be used for performing modular pairwise network alignment.

In general, these results substantiate the important role of the notions of orthology and articulation in modular comparative PPI network analysis.

This paper contains and extends material from two previous conferences’ papers of Jancura et al., 2008a, Jancura et al., 2008b. It is organized as follows. In the next section we discuss related works. Section 3 describes the graph-theoretic terminology used in the paper. The Divide algorithm is introduced in Section 4. Section 5 summarizes the data and the type of assessment employed in the experimental analysis. In Section 6 the robustness of Divide is assessed by analysing how the generated sub-graphs cover “true” complexes. In Section 7 the sub-graphs generated by Divide are compared with the complexes predicted by MaWish. In Section 8 modular network alignment is performed on the two case studies above described. Finally, we conclude and briefly address future work in Section 9.

Section snippets

Related work

Recent overviews of approaches and issues in comparative biological networks analysis have been presented by Sharan and Ideker, 2006, Srinivasan et al., 2007 since the first formulation of network alignment introduced by Kelley et al. (2003).

In general, network alignment methods have been proposed for discovering conserved metabolic pathways, conserved functional complexes, and for detecting functional orthologs. For instance, Kelley et al. (2003) introduced an approach for detecting conserved

Graph theoretic background

Given a graph G = (U,E), nodes joined by an edge are called adjacent. A neighbor of a node u is a node adjacent to u. The degree of u is the number of elements in E containing the vertex u.

A graph G = (U,E) is called undirected if uu^′ in E implies u^′u also in E; otherwise G is called directed. A directed acyclic graph is a directed graph that contains no cycles.

A sub-graph H(V,F) of an undirected graph G(U,E) is said to be induced by the set of nodes V ⊂ U if and only if the set of edges F ⊂ E

Divide algorithm

Suppose given the PPI networks G and G₁ of two species. Let G(U,E) and O ⊆ U be the set of vertices which are orthologous w.r.t. the vertices of G₁. Suppose O contains n elements. The Divide algorithm is shown in pseudo-code in Algorithm 1. It generates centers from orthologous articulations and expands them into centered sub-trees containing only orthologous proteins. The main steps of Divide are described in detail below.

Computing articulations (Line 1). Computation of articulations can be

Experimental analysis

The effectiveness and robustness of the proposed pre-processing method is assessed experimentally in the following three ways.

First, we show that the sub-graphs generated by Divide indeed cover “true” conserved protein complexes. This is done by measuring the overlap of the generated sub-graphs with yeast MIPS curated functional complexes restricted to those proteins belonging to an orthologous pair.

Next, we show that the resulting sub-graphs cover protein complexes computationally predicted by

Divide generates sub-graphs covering “true” protein conserved complexes

Let Divide sub-graphs denote the sub-graphs generated by Divide. We compared Divide sub-graphs with “true” protein conserved complexes. To this aim, we evaluated the quality of sub-graphs generated by Divide using known yeast complexes catalogued in the MIPS database² (Güldener et al., 2005). Category 550, which was obtained from high throughput experiments, is excluded and we retained only manually annotated complexes up to depth 3 in the

Comparison of Divide sub-graphs with predicted conserved complexes

Here we investigate how Divide constrains the search process of MaWish, and whether the sub-graphs generated by Divide cover those produced by MaWish. To this end, we used the conserved complexes predicted by this alignment method and processed by the clique-rule merging procedure, where complexes consisting of one or two proteins were filtered out. We call the resulting sets MaWish complexes.

In the right column of Fig. 4 one can observe that a number of yeast MaWish complexes are fully covered

Applications of modular network alignment

In this section, we investigate the ability of Divide to enhance the performance of alignment methods.

Specifically, we apply Divide to two different alignment methods.

In the first case, we consider an instance of modular local network alignment, called DivAfull (Jancura et al., 2008a). DivAfull employs Divide to generate sub-graphs, the MaWish alignment model to align them, and iterative exact search to detect all possible solutions from the generated alignments. Therefore, application of Divide

Conclusion

This paper introduced a heuristic algorithm, Divide, for dividing protein interaction networks in such a way that conserved functional complexes are covered by generated sub-graphs. To the best of our knowledge, this is the first algorithm for this task, which can be used to perform modular network alignment of protein interaction networks (Jancura et al., 2008a, Jancura et al., 2008b).

The selection of centers is biased on the orthology information but it can be changed for another property.

Acknowledgments

The authors thank Mehmet Koyutürk for providing the MaWish code.

References (64)

W. Ali et al.
Functionally guided alignment of protein interaction networks for module detection
Bioinformatics
(2009)
M. Ashburner et al.
Gene ontology: tool for the unification of biology
Nat. Genet.
(2000)
G. Bader et al.
An automated method for finding molecular complexes in large protein interaction networks
BMC Bioinform.
(2003)
G.D. Bader et al.
Bind – the biomolecular interaction network database
Nucleic Acid Res.
(2001)
S. Bandyopadhyay et al.
Systematic identification of functional orthologs based on protein network comparison
Genome Res.
(2006)
J. Berg et al.
Cross-species analysis of biological networks by Bayesian alignment
Proc. Natl. Acad. Sci.
(2006)
Blin, G., Sikora, F., Vialette, S., 2009. Querying protein–protein interaction networks. In: ISBRA ’09: Proceedings of...
C. Bron et al.
Algorithm 457: finding all cliques of an undirected graph
Commun. ACM
(1973)
S. Bruckner et al.
Torque: topology-free querying of protein interaction networks
Nucleic Acid Res.
(2009)
Cheng, Q., Berman, P., Harrison, R., Zelikovsky, A., 2008. Fast alignments of metabolic networks. In: BIBM ’08:...

L. Chindelevitch et al.

Local optimization for global alignment of protein interaction networks

Pacific Symp. Biocomput.

(2010)

M. Deng et al.

Prediction of protein function using proteinprotein interaction data

J. Comput. Biol.

(2003)

B. Dost et al.

Qnet: a tool for querying protein interaction networks

J. Comput. Biol.

(2008)

J. Dutkowski et al.

Identification of functional modules from conserved ancestral protein protein interactions

Bioinformatics

(2007)

D. Ekman et al.

What properties characterize the hub proteins of the protein–protein interaction network of Saccharomyces cerevisiae?

Genome Biol.

(2006)

Evans, P., Sandler, T., Ungar, L., 2008. Protein–protein interaction network alignment by quantitative simulation. In:...

J. Flannick et al.

Graemlin: general and robust alignment of multiple large interaction networks

Genome Res.

(2006)

J. Flannick et al.

Automatic parameter learning for multiple local network alignment

J. Comput. Biol.

(2009)

U. Güldener et al.

CYGD: the comprehensive yeast genome database

Nucleic Acid Res.

(2005)

X. Guo et al.

Domain-oriented edge-based alignment of protein interaction networks

Bioinformatics

(2009)

J. Handl et al.

Computational cluster validation in post-genomic data analysis

Bioinformatics

(2005)

E. Hirsh et al.

Identification of conserved protein complexes based on a model of protein network evolution

Bioinformatics

(2007)

J. Hopcroft et al.

Algorithm 447: efficient algorithms for graph manipulation

Commun. ACM

(1973)

P. Jancura et al.

Divide, align and full-search for discovering conserved protein complexes

Jancura, P., Heringa, J., Marchiori, E., 2008b. Dividing protein interaction networks by growing orthologous...

H. Jeong et al.

Lethality and centrality in protein networks

Nature

(2001)

M. Kalaev et al.

Fast and accurate alignment of multiple protein networks

J. Comput. Biol.

(2009)

B.P. Kelley et al.

Conserved pathways within bacteria and yeast as revealed by global protein network alignment

Proc. Natl. Acad. Sci.

(2003)

G. Klau

A new graph-based method for pairwise global network alignment

BMC Bioinform.

(2009)

M. Koyutürk et al.

Pairwise local alignment of protein interaction networks guided by models of evolution

M. Koyutürk et al.

Detecting conserved interaction patterns in biological networks

Journal of Computational Biology

(2006)

M. Koyutürk et al.

Pairwise alignment of protein interaction networks

J. Comput. Biol.

(2006)

Cited by (9)

Cross-domain network representations
2019, Pattern Recognition
Citation Excerpt :
Techniques not only analysis networks but also learn knowledge from network structures which has become a main stream in network research for artificial intelligence purposes [1,2]. To this end, networks are preliminarily categorized based on real-world systems and their physical properties, such as social network [3,4], biological network [5] and citation network [6]. As shown in Fig. 1, social networks (a) denote users as nodes and friendship as edges; biological networks such as the Protein-Protein Interactions (PPI) network (b) models proteins as nodes and PPI as edges; and citation networks (c) represent papers as nodes and citations as edges.
The purpose of network representation is to learn a set of latent features by obtaining community information from network structures to provide knowledge for machine learning tasks. Recent research has driven significant progress in network representation by employing random walks as the network sampling strategy. Nevertheless, existing approaches rely on domain-specifically rich community structures and fail in the network that lack topological information in its own domain. In this paper, we propose a novel algorithm for cross-domain network representation, named as CDNR. By generating the random walks from a structural rich domain and transferring the knowledge on the random walks across domains, it enables a network representation for the structural scarce domain as well. To be specific, CDNR is realized by a cross-domain two-layer node-scale balance algorithm and a cross-domain two-layer knowledge transfer algorithm in the framework of cross-domain two-layer random walk learning. Experiments on various real-world datasets demonstrate the effectiveness of CDNR for universal networks in an unsupervised way.
Dense subgraph mining with a mixed graph model
2013, Pattern Recognition Letters
Citation Excerpt :
These methods are applied even if all the vertices are needed to be clustered. Dense subgraphs are considered as seeds of clusters, and the remaining vertices are clustered based on their similarities to the cluster seeds (Du et al., 2008; Jancura and Marchiori, 2010). However, for bipartite graphs it is also proven that for a wide range of edge weights even finding good approximations of the maximum weight biclique in polynomial time is impossible (Tan, 2008).
In this paper we introduce a graph clustering method based on dense bipartite subgraph mining. The method applies a mixed graph model (both standard and bipartite) in a three-phase algorithm. First a seed mining method is applied to find seeds of clusters, the second phase consists of refining the seeds, and in the third phase vertices outside the seeds are clustered. The method is able to detect overlapping clusters, can handle outliers and applicable without restrictions on the degrees of vertices or the size of the clusters. The running time of the method is polynomial. A theoretical result is introduced on density bounds of bipartite subgraphs with size and local density conditions. Test results on artificial datasets and social interaction graphs are also presented.
Cross-domain Network Representations
2019, arXiv
Detecting conserved protein complexes using a dividing-and-matching algorithm and unequally lenient criteria for network comparison
2015, Algorithms for Molecular Biology
A network comparison method for finding the conservative interaction regions in protein interaction network
2013, Journal of Computational and Theoretical Nanoscience
A dividing-and-matching algorithm to detect conserved protein complexes via local network alignment
2013, Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013

View all citing articles on Scopus

^☆: This manuscript is an extended version of the conference paper Jancura et al. (2008b) presented at the Third IAPR International Conference on Pattern Recognition in Bioinformatics, Melbourne, Australia, 2008.

View full text

Dividing protein interaction networks for modular network comparative analysis☆

Abstract

Introduction

Section snippets

Related work

Graph theoretic background

Divide algorithm

Experimental analysis

Divide generates sub-graphs covering “true” protein conserved complexes

Comparison of Divide sub-graphs with predicted conserved complexes

Applications of modular network alignment

Conclusion

Acknowledgments

Functionally guided alignment of protein interaction networks for module detection

Bioinformatics

Gene ontology: tool for the unification of biology

Nat. Genet.

An automated method for finding molecular complexes in large protein interaction networks

BMC Bioinform.

Bind – the biomolecular interaction network database

Nucleic Acid Res.

Systematic identification of functional orthologs based on protein network comparison

Genome Res.

Cross-species analysis of biological networks by Bayesian alignment

Proc. Natl. Acad. Sci.

Algorithm 457: finding all cliques of an undirected graph

Commun. ACM

Torque: topology-free querying of protein interaction networks

Nucleic Acid Res.

Local optimization for global alignment of protein interaction networks

Pacific Symp. Biocomput.

Prediction of protein function using proteinprotein interaction data

J. Comput. Biol.

Qnet: a tool for querying protein interaction networks

J. Comput. Biol.

Identification of functional modules from conserved ancestral protein protein interactions

Bioinformatics

What properties characterize the hub proteins of the protein–protein interaction network of Saccharomyces cerevisiae?

Genome Biol.

Graemlin: general and robust alignment of multiple large interaction networks

Genome Res.

Automatic parameter learning for multiple local network alignment

J. Comput. Biol.

CYGD: the comprehensive yeast genome database

Nucleic Acid Res.

Domain-oriented edge-based alignment of protein interaction networks

Bioinformatics

Computational cluster validation in post-genomic data analysis

Bioinformatics

Identification of conserved protein complexes based on a model of protein network evolution

Bioinformatics

Algorithm 447: efficient algorithms for graph manipulation

Commun. ACM

Divide, align and full-search for discovering conserved protein complexes

Lethality and centrality in protein networks

Nature

Fast and accurate alignment of multiple protein networks

J. Comput. Biol.

Conserved pathways within bacteria and yeast as revealed by global protein network alignment

Proc. Natl. Acad. Sci.

A new graph-based method for pairwise global network alignment

BMC Bioinform.

Pairwise local alignment of protein interaction networks guided by models of evolution

Detecting conserved interaction patterns in biological networks

Journal of Computational Biology

Pairwise alignment of protein interaction networks

J. Comput. Biol.