Elsevier

Pattern Recognition Letters

Volume 31, Issue 14, 15 October 2010, Pages 2083-2096
Pattern Recognition Letters

Dividing protein interaction networks for modular network comparative analysis

https://doi.org/10.1016/j.patrec.2010.04.005Get rights and content

Abstract

The increasing growth of data on protein–protein interaction (PPI) networks has boosted research on their comparative analysis. In particular, recent studies proposed models and algorithms for performing network alignment, that is, the comparison of networks across species for discovering conserved functional complexes. In this paper, we present an algorithm for dividing PPI networks, prior to their alignment, into small sub-graphs that are likely to cover conserved complexes. This allows one to perform network alignment in a modular fashion, by acting on pairs of resulting small sub-graphs from different species. The proposed dividing algorithm combines a graph-theoretical property (articulation) with a biological one (orthology). Extensive experiments on various PPI networks are conducted in order to assess how well the sub-graphs generated by this dividing algorithm cover protein functional complexes and whether the proposed pre-processing step can be used for enhancing the performance of network alignment algorithms. Source code of the dividing algorithm is available upon request for academic use.

Introduction

With the exponential increase of data on protein interactions obtained from advanced technologies, data on thousands of interactions in human and most model species have become available (e.g., Bader et al., 2001, Xenarios et al., 2002). PPI networks offer a powerful representation for better understanding modular organization of cells, for predicting biological functions and for providing insight into a variety of biochemical processes.

Recent studies consider a comparative approach for the analysis of PPI networks from different species in order to discover common protein groups, called conserved complexes, which are likely to be related and to share similar functionality in a cell (Sharan and Ideker, 2006, Srinivasan et al., 2007). This problem is known as protein network alignment. Algorithms for this task typically model this problem by means of a merged graph representation of the networks to be compared, called alignment (or orthology) graph, and then formalize the problem of searching (merged) conserved complexes in the alignment graph as an optimization problem. Due to the computational intractability of the resulting optimization problem, greedy algorithms are commonly used.

One can identify two main network alignment categories. Local network alignment, that identifies the best local mapping for each local region of similarity between input networks, and global network alignment, that searches for the best single mapping across all parts of the input networks, even if it is locally sub-optimal in some regions of the networks. If a method aligns networks of just two species, it is called pairwise network alignment, while if it can handle more than two networks, it is called multiple network alignment.

Many methods for network alignment have been proposed. We describe them briefly in the next section on related work.

The aim of this paper is not to propose yet another network alignment algorithm, but to show how PPI networks can be divided, prior to their alignment, into small sub-graphs that are likely to cover conserved complexes.

Conserved complexes discovered by computational techniques have in general small size (that is, number of proteins) compared to the size of the PPI network they belong to. Moreover, PPI networks are known to have a scale-free topology where most proteins participate in a small number of interactions while a few proteins, called hubs, contain a high number of interactions. As indicated by a recent study, hubs whose removal disconnects a PPI network (articulation hubs) are likely to appear in conserved interaction patterns (Pržulj, 2005).

These observations motivate the introduction of an algorithm for dividing PPI networks, called Divide, that combines biological (orthology) and graph theoretical (articulation) information: it detects small groups of ortholog articulations, called centers, which are then expanded into subsets of ortholog nodes. This algorithm has the desirable property of being parameterless.

The effectiveness and robustness of Divide is assessed experimentally in the following three ways.

First, we show that the sub-graphs generated by Divide indeed cover “true” conserved protein complexes. This is done by measuring the overlap of these sub-graphs with MIPS curated functional complexes restricted to those proteins belonging to an orthologous pair.

Next, we show that the generated sub-graphs cover protein complexes computationally predicted. Specifically, we compare these sub-graphs with the conserved complexes predicted by one state-of-the-art pairwise local alignment algorithm, called MaWish (Koyutürk et al., 2006b). We investigate experimentally how Divide biases the search process of MaWish, and whether the generated sub-graphs contain information to be used for discovering new conserved complexes. Results of an extensive experimental analysis indicate that indeed Divide generates sub-graphs containing conserved complexes that are not detected by MaWish.

Finally, we consider two case studies of modular network alignment. In the first case study, Divide is used to generate sub-graphs, which are then pairwise merged using the networks merging model of MaWish. We apply iterative exact search to the resulting alignment graphs. Results of experiments show ability to detect a high number of accurate conserved complexes. In the second case study, Divide is used for enhancing an existing method for discovering conserved functional complexes, called MNAligner (Li et al., 2007). MNAligner consists of two main steps: first, candidate functional complexes within one species are detected using a clustering algorithm (MCODE); next, an exact optimization algorithm is applied for matching the resulting candidate functional complexes with sub-graphs of the other species in order to extract conserved complexes. Results of experiments show that by applying Divide to orthologs nodes prior to clustering enhances the performance of this algorithm.

To the best of our knowledge, we propose the first algorithm which directly tackles the modularity issue in network alignment by showing that Divide generates sub-graphs that cover conserved complexes and can be used for performing modular pairwise network alignment.

In general, these results substantiate the important role of the notions of orthology and articulation in modular comparative PPI network analysis.

This paper contains and extends material from two previous conferences’ papers of Jancura et al., 2008a, Jancura et al., 2008b. It is organized as follows. In the next section we discuss related works. Section 3 describes the graph-theoretic terminology used in the paper. The Divide algorithm is introduced in Section 4. Section 5 summarizes the data and the type of assessment employed in the experimental analysis. In Section 6 the robustness of Divide is assessed by analysing how the generated sub-graphs cover “true” complexes. In Section 7 the sub-graphs generated by Divide are compared with the complexes predicted by MaWish. In Section 8 modular network alignment is performed on the two case studies above described. Finally, we conclude and briefly address future work in Section 9.

Section snippets

Related work

Recent overviews of approaches and issues in comparative biological networks analysis have been presented by Sharan and Ideker, 2006, Srinivasan et al., 2007 since the first formulation of network alignment introduced by Kelley et al. (2003).

In general, network alignment methods have been proposed for discovering conserved metabolic pathways, conserved functional complexes, and for detecting functional orthologs. For instance, Kelley et al. (2003) introduced an approach for detecting conserved

Graph theoretic background

Given a graph G =  (U,E), nodes joined by an edge are called adjacent. A neighbor of a node u is a node adjacent to u. The degree of u is the number of elements in E containing the vertex u.

A graph G =  (U,E) is called undirected if uu in E implies uu also in E; otherwise G is called directed. A directed acyclic graph is a directed graph that contains no cycles.

A sub-graph H(V,F) of an undirected graph G(U,E) is said to be induced by the set of nodes V  U if and only if the set of edges F  E

Divide algorithm

Suppose given the PPI networks G and G1 of two species. Let G(U,E) and O  U be the set of vertices which are orthologous w.r.t. the vertices of G1. Suppose O contains n elements. The Divide algorithm is shown in pseudo-code in Algorithm 1. It generates centers from orthologous articulations and expands them into centered sub-trees containing only orthologous proteins. The main steps of Divide are described in detail below.

Computing articulations (Line 1). Computation of articulations can be

Experimental analysis

The effectiveness and robustness of the proposed pre-processing method is assessed experimentally in the following three ways.

First, we show that the sub-graphs generated by Divide indeed cover “true” conserved protein complexes. This is done by measuring the overlap of the generated sub-graphs with yeast MIPS curated functional complexes restricted to those proteins belonging to an orthologous pair.

Next, we show that the resulting sub-graphs cover protein complexes computationally predicted by

Divide generates sub-graphs covering “true” protein conserved complexes

Let Divide sub-graphs denote the sub-graphs generated by Divide. We compared Divide sub-graphs with “true” protein conserved complexes. To this aim, we evaluated the quality of sub-graphs generated by Divide using known yeast complexes catalogued in the MIPS database2 (Güldener et al., 2005). Category 550, which was obtained from high throughput experiments, is excluded and we retained only manually annotated complexes up to depth 3 in the

Comparison of Divide sub-graphs with predicted conserved complexes

Here we investigate how Divide constrains the search process of MaWish, and whether the sub-graphs generated by Divide cover those produced by MaWish. To this end, we used the conserved complexes predicted by this alignment method and processed by the clique-rule merging procedure, where complexes consisting of one or two proteins were filtered out. We call the resulting sets MaWish complexes.

In the right column of Fig. 4 one can observe that a number of yeast MaWish complexes are fully covered

Applications of modular network alignment

In this section, we investigate the ability of Divide to enhance the performance of alignment methods.

Specifically, we apply Divide to two different alignment methods.

In the first case, we consider an instance of modular local network alignment, called DivAfull (Jancura et al., 2008a). DivAfull employs Divide to generate sub-graphs, the MaWish alignment model to align them, and iterative exact search to detect all possible solutions from the generated alignments. Therefore, application of Divide

Conclusion

This paper introduced a heuristic algorithm, Divide, for dividing protein interaction networks in such a way that conserved functional complexes are covered by generated sub-graphs. To the best of our knowledge, this is the first algorithm for this task, which can be used to perform modular network alignment of protein interaction networks (Jancura et al., 2008a, Jancura et al., 2008b).

The selection of centers is biased on the orthology information but it can be changed for another property.

Acknowledgments

The authors thank Mehmet Koyutürk for providing the MaWish code.

References (64)

  • W. Ali et al.

    Functionally guided alignment of protein interaction networks for module detection

    Bioinformatics

    (2009)
  • M. Ashburner et al.

    Gene ontology: tool for the unification of biology

    Nat. Genet.

    (2000)
  • G. Bader et al.

    An automated method for finding molecular complexes in large protein interaction networks

    BMC Bioinform.

    (2003)
  • G.D. Bader et al.

    Bind – the biomolecular interaction network database

    Nucleic Acid Res.

    (2001)
  • S. Bandyopadhyay et al.

    Systematic identification of functional orthologs based on protein network comparison

    Genome Res.

    (2006)
  • J. Berg et al.

    Cross-species analysis of biological networks by Bayesian alignment

    Proc. Natl. Acad. Sci.

    (2006)
  • Blin, G., Sikora, F., Vialette, S., 2009. Querying protein–protein interaction networks. In: ISBRA ’09: Proceedings of...
  • C. Bron et al.

    Algorithm 457: finding all cliques of an undirected graph

    Commun. ACM

    (1973)
  • S. Bruckner et al.

    Torque: topology-free querying of protein interaction networks

    Nucleic Acid Res.

    (2009)
  • Cheng, Q., Berman, P., Harrison, R., Zelikovsky, A., 2008. Fast alignments of metabolic networks. In: BIBM ’08:...
  • L. Chindelevitch et al.

    Local optimization for global alignment of protein interaction networks

    Pacific Symp. Biocomput.

    (2010)
  • M. Deng et al.

    Prediction of protein function using proteinprotein interaction data

    J. Comput. Biol.

    (2003)
  • B. Dost et al.

    Qnet: a tool for querying protein interaction networks

    J. Comput. Biol.

    (2008)
  • J. Dutkowski et al.

    Identification of functional modules from conserved ancestral protein protein interactions

    Bioinformatics

    (2007)
  • D. Ekman et al.

    What properties characterize the hub proteins of the protein–protein interaction network of Saccharomyces cerevisiae?

    Genome Biol.

    (2006)
  • Evans, P., Sandler, T., Ungar, L., 2008. Protein–protein interaction network alignment by quantitative simulation. In:...
  • J. Flannick et al.

    Graemlin: general and robust alignment of multiple large interaction networks

    Genome Res.

    (2006)
  • J. Flannick et al.

    Automatic parameter learning for multiple local network alignment

    J. Comput. Biol.

    (2009)
  • U. Güldener et al.

    CYGD: the comprehensive yeast genome database

    Nucleic Acid Res.

    (2005)
  • X. Guo et al.

    Domain-oriented edge-based alignment of protein interaction networks

    Bioinformatics

    (2009)
  • J. Handl et al.

    Computational cluster validation in post-genomic data analysis

    Bioinformatics

    (2005)
  • E. Hirsh et al.

    Identification of conserved protein complexes based on a model of protein network evolution

    Bioinformatics

    (2007)
  • J. Hopcroft et al.

    Algorithm 447: efficient algorithms for graph manipulation

    Commun. ACM

    (1973)
  • P. Jancura et al.

    Divide, align and full-search for discovering conserved protein complexes

  • Jancura, P., Heringa, J., Marchiori, E., 2008b. Dividing protein interaction networks by growing orthologous...
  • H. Jeong et al.

    Lethality and centrality in protein networks

    Nature

    (2001)
  • M. Kalaev et al.

    Fast and accurate alignment of multiple protein networks

    J. Comput. Biol.

    (2009)
  • B.P. Kelley et al.

    Conserved pathways within bacteria and yeast as revealed by global protein network alignment

    Proc. Natl. Acad. Sci.

    (2003)
  • G. Klau

    A new graph-based method for pairwise global network alignment

    BMC Bioinform.

    (2009)
  • M. Koyutürk et al.

    Pairwise local alignment of protein interaction networks guided by models of evolution

  • M. Koyutürk et al.

    Detecting conserved interaction patterns in biological networks

    Journal of Computational Biology

    (2006)
  • M. Koyutürk et al.

    Pairwise alignment of protein interaction networks

    J. Comput. Biol.

    (2006)
  • Cited by (9)

    • Cross-domain network representations

      2019, Pattern Recognition
      Citation Excerpt :

      Techniques not only analysis networks but also learn knowledge from network structures which has become a main stream in network research for artificial intelligence purposes [1,2]. To this end, networks are preliminarily categorized based on real-world systems and their physical properties, such as social network [3,4], biological network [5] and citation network [6]. As shown in Fig. 1, social networks (a) denote users as nodes and friendship as edges; biological networks such as the Protein-Protein Interactions (PPI) network (b) models proteins as nodes and PPI as edges; and citation networks (c) represent papers as nodes and citations as edges.

    • Dense subgraph mining with a mixed graph model

      2013, Pattern Recognition Letters
      Citation Excerpt :

      These methods are applied even if all the vertices are needed to be clustered. Dense subgraphs are considered as seeds of clusters, and the remaining vertices are clustered based on their similarities to the cluster seeds (Du et al., 2008; Jancura and Marchiori, 2010). However, for bipartite graphs it is also proven that for a wide range of edge weights even finding good approximations of the maximum weight biclique in polynomial time is impossible (Tan, 2008).

    • A dividing-and-matching algorithm to detect conserved protein complexes via local network alignment

      2013, Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
    View all citing articles on Scopus

    This manuscript is an extended version of the conference paper Jancura et al. (2008b) presented at the Third IAPR International Conference on Pattern Recognition in Bioinformatics, Melbourne, Australia, 2008.

    View full text