ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

CySpanningTree: Minimal Spanning Tree computation in Cytoscape

[version 1; peer review: 1 approved, 1 approved with reservations]
PUBLISHED 05 Aug 2015
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Cytoscape gateway.

Abstract

Simulating graph models for real world networks is made easy using software tools like Cytoscape. In this paper, we present the open-source CySpanningTree app for Cytoscape that creates a minimal/maximal spanning tree network for a given Cytoscape network. CySpanningTree provides two historical ways for calculating a spanning tree: Prim’s and Kruskal’s algorithms. Minimal spanning tree discovery in a given graph is a fundamental problem with diverse applications like spanning tree network optimization protocol, cost effective design of various kinds of networks, approximation algorithm for some NP-hard problems, cluster analysis, reducing data storage in sequencing amino acids in a protein, etc. This article demonstrates the procedure for extraction of a spanning tree from complex data sets like gene expression data and world network. The article also provides an approximate solution to the traveling salesman problem with minimum spanning tree heuristic. CySpanningTree for Cytoscape 3 is available from the Cytoscape app store.

Keywords

minimum spanning tree, gene expression data, euclidean distance, Hamiltonian cycle

Introduction

Graph theory is being widely used for network analysis in various fields1. Extraction of various kinds of subnetworks is one of the ways to identify functional modules within complex networks2. A tree is a subnetwork with minimal connections. Specifically in graph theory, a tree is a graph with only one path between every two nodes. In other words, any connected graph without simple cycles is a tree. Given a connected graph, which is not a tree, one can extract a tree from it by eliminating cyclic edges. A spanning tree contains all the nodes of the graph and has (N-1) edges where N is the number of nodes in the given graph. Extracting a spanning tree gets interesting when edges of the given graph have weights. In finding the minimal/maximal spanning tree, one would ideally extract the tree whose sum of weights is minimum/maximum respectively. The weight of a spanning tree is the sum of weights given to each edge of the spanning tree. There may be several minimum spanning trees of the same weight; in particular, if all the edge weights of a given graph are the same, every spanning tree of that graph is minimal. If each edge has a distinct weight then there will be only one unique minimum spanning tree.

In this paper, we present CySpanningTree, a Cytoscape3 3 app for extracting a spanning tree from a given graph. Once the user imports a dataset, by clicking the “Create spanning tree” button of the app, a new spanning tree network is created in the network panel of Cytoscape. Historically, spanning trees are used in various applications like constructing a road network between cities with a minimum cost, as a heuristic for the traveling salesman problem (TSP), for the spanning tree network optimization protocol in networking, clustering gene expression data, etc. Three of the mentioned cases have been demonstrated in the use cases section.

Methods

Implementation

CySpanningTree is the Java implementation of Prim’s4 and Kruskal’s algorithms5, using the Cytoscape 3 API and Java 7 for extracting a minimal spanning tree (MST). An MST for a given graph might not be unique, however for a given same Cytoscape session, the tie-breaking approach for selecting edges of equal weights is deterministic. The user gets the same spanning tree in a given Cytoscape session unless he reloads the network.

This tool also has a “Create Hamiltonian cycle” button which invokes the computation of the Hamiltonian cycle6. For computing this cycle, it first finds an MST using Prim’s algorithm and then performs a pre-order traversal on it. This pre-order traversal is a modified version of the depth-first search algorithm which results in a Hamiltonian path. Later, we connect the last node and the first node of this path to make a cycle. Users are recommended to run the Hamiltonian cycle algorithm on a fully connected graph to avoid missing of the edges while traversing.

Table 1 has the complexities of the algorithms and the uniqueness of the outputs used in the app. Prim’s algorithm runs using adjacency list representation of the graph and thus implemented with a complexity O(V2). Kruskal’s algorithm runs using adjacency matrix of the graph and has a complexity of O(EV2(E+V)). The Hamiltonian cycle first calculates a spanning tree using Prim’s algorithm with a complexity of O(V2) and then runs depth-first search algorithm with a complexity O(E + V).

Table 1. Comparison of algorithms used in CySpanningTree.

AlgorithmComplexityUniqueness
Prim’sO(V2)not unique
Kruskal’sO(EV2(E + V))not unique
Hamiltonian cycleO(V2 + E)not unique

Graphical user interface

The GUI component of CySpanningTree is represented as a tabbed panel in the control panel of Cytoscape. Cytoscape takes care of loading the input network. The CySpanningTree menu (Figure 1) loads in the control panel of Cytoscape by selecting it from App menu. Currently the app runs only on connected networks. When the user tries to execute a spanning tree algorithm on an unconnected graph, an error message pops up. For weighted graphs, the user has to select the edge attribute from the drop down list (which is by default “None” that treats all edges with the same weight).

707eb89b-3841-451a-add2-4e06d93e8db6_figure1.gif

Figure 1. User interface of CySpanningTree.

Setting the root node for Prim’s spanning tree

Prim’s algorithm starts with a root node and hence the user is asked for the same when the Prim’s Spanning Tree button is pressed. If the user enters a node that is not in the network, the user gets an error message and the program terminates.

Visualizations

The resultant MST or the Hamiltonian cycle network has the same layout as that of the input network with nodes positioned at the same location and edges scaled down. When spanning tree subnetworks are created, the corresponding spanning edges are highlighted in the input network. In Figure 2, the input network is a fully connected graph of capital cities of countries in the world, containing 203 cities and 20503 connections between them. The resultant networks: “Kruskal’s Spanning Tree”, “Prim’s Spanning Tree” and “Hamiltonian Cycle” are connected graphs containing all the 203 cities and only 202, 202 and 203 edges respectively. Spanning trees are extracted as separate Cytoscape networks under the same network collection as shown in Figure 2.

707eb89b-3841-451a-add2-4e06d93e8db6_figure2.gif

Figure 2. New networks created dynamically in Control panel.

Use cases

In this section, we present the spanning tree results on use cases with datasets in four scenarios: gene expression matrix of gene expression data, building a cost efficient road network when all possible costs are known, an approximate solution to the travelling salesman problem and connecting a 10-home village with phone lines with minimum wiring. In each scenario, the contents of the network are introduced first and then extraction of spanning trees is demonstrated.

MST of gene expression data

The expression levels of genes when exposed to various environmental conditions are recorded at different times with different samples. This data is called gene expression data and is analyzed to extract the similarities between genes. Gene expression data G(g1,g2,,gn) for n genes is multi-dimensional data with each gi=(di1,di2,,dim) for given m expression levels. Here gi represents the ith gene and dij represents the jth expression level of this ith gene.

G=[d11d12d13d1md21d23d23d2mdn1dn2dn3dnm]

This data has been simulated as a graph with nodes being genes and edges being the genetic distance between them. Genetic distance is defined as the measurement of similarity between genes.

Euclidean distance between genes gi and gj = (di1dj1)2+(di2dj2)2++(dimdjm)2

For each pair of genes, this genetic distance is calculated which gives a fully connected graph. The data set7 has been taken from the Saccharomyces Genome Database and contains expression levels of budding yeast — S. cerevisiae with a total of 6149 genes (http://downloads.yeastgenome.org/expression/microarray/Cho_1998_PMID_9702192/). Typically, it becomes difficult to visualize a large graph of 6149 nodes with each node connected to every other node in the graph. A spanning tree of the gene expression data makes it possible to visualize such a large network as shown in Figure 3.

707eb89b-3841-451a-add2-4e06d93e8db6_figure3.gif

Figure 3. Spanning tree obtained from graph of S. cerevisiae expression data; Layout: Allegro Spring-Electric layout using Allegro Layout app in Cytoscape.

  • Input network: A fully connected graph of S. cerevisiae expression data

  • Nodes: Genes of S. cerevisiae

  • Edges: Euclidean distance between genes calculated using expression levels

  • Output network (Figure 3): Kruskal’s spanning tree of the input gene expression data

Although a lot of edges are removed from the network during the process of creating a spanning tree, no essential information is lost8. A spanning tree is a better way to visualize large networks compared to fully connected graphs. We observed that genes with similar functionalities are connected closely in the resultant spanning tree. Many clustering algorithms have been applied to gene expression data8,9, we are currently working on clustering using minimum spanning trees for our next release of CySpanningTree.

MST on world network

This dataset10 consists of nodes which are capital cities of all countries in the world and edges between them representing the distance in kilometers. These distances are measured using latitude and longitude coordinates of the cities (http://privatewww.essex.ac.uk/~ksg/data-5.html). This dataset, when imported into Cytoscape, results in a fully connected graph as the distance is calculated for each pair of capital cities. Prim’s algorithm has been executed on this dataset to produce a MST network as shown in Figure 5

  • Input network: Fully connected graph of capitals cities as shown in Figure 4

  • Nodes: Capital cities of all countries in the world

  • Edges: Displacement between cities

  • Output minimum spanning tree: Network with minimum cost such that each city is connected. Cities separated with large distances are represented with strong edges as shown in Figure 5

707eb89b-3841-451a-add2-4e06d93e8db6_figure4.gif

Figure 4. Fully connected graph of the capital city network; Layout: Allegro Spring-Electric layout using Allegro Layout app in Cytoscape.

707eb89b-3841-451a-add2-4e06d93e8db6_figure5.gif

Figure 5. Minimum Spanning Tree of the capital city network; Layout: Allegro Spring-Electric layout using Allegro Layout app in Cytoscape.

707eb89b-3841-451a-add2-4e06d93e8db6_figure6.gif

Figure 6. Fully connected graph of 5 cities and their displacements.

707eb89b-3841-451a-add2-4e06d93e8db6_figure7.gif

Figure 7. MST of the network in Figure 6.

Furthermore, this solution can be used for drawing a Hamiltonian cycle which is an approximation to the Travelling Salesman problem. Drawing a Hamiltonian cycle for a smaller network is discussed in the next subsection.

MST as a heuristic solution for the TSP

The TSP is a well-known combinatorial optimization problem. The goal is to find the shortest tour that visits each city in a given list exactly once and returns to the starting city. Though the problem statement looks simple, TSP is NP-complete11. Even though the problem is computationally difficult, a large number of heuristic solutions12 are known due to the number of applications of this problem13 like planning, logistics, DNA sequencing, predicting protein functions, etc.

Pre-order traversal on a minimum spanning tree is one of the heuristic solutions for TSP5,14. In this subsection, a Hamiltonian cycle is drawn for a spanning tree to show that the resultant cycle is a near solution to the TSP. The optimal TSP tour in Figure 9 is about 17% shorter than the Hamiltonian cycle obtained using spanning tree in Figure 8. On executing the Hamiltonian cycle algorithm on the input network, the software will create both Prim’s spanning tree as well as the Hamiltonian cycle. Five nodes from the above capital city network are used for the TSP use case.

707eb89b-3841-451a-add2-4e06d93e8db6_figure8.gif

Figure 8. Hamiltonian cycle drawn from the spanning tree with USA as starting node.

707eb89b-3841-451a-add2-4e06d93e8db6_figure9.gif

Figure 9. Optimal TSP tour from USA.

  • Input network: Fully connected graph of 5 capital cities

  • Nodes: Capital cities of countries: USA, Brazil, South Africa, India and Italy

  • Edges: Displacement between cities shown in kilometers

Connecting a 10-home village with phone lines

This dataset consists of houses depicted as nodes and the edges are the means by which one house can be wired up to another. The weights of the edges dictate the distance between the houses. The task of the telephone company is to wire all houses using the least amount of telephone wiring possible.

  • Input network: Houses in village depicted as graph as shown in Figure 10

  • Nodes: Houses H1 to H10

  • Edges: Distance between the houses

  • Output MST: Network which connects the houses via wires with least possible wiring. Figure 11 and Figure 12 are the spanning trees obtained using Prim’s (H1 as root node) and Kruskal’s algorithm, respectively.

707eb89b-3841-451a-add2-4e06d93e8db6_figure10.gif

Figure 10. Houses depicted as nodes.

707eb89b-3841-451a-add2-4e06d93e8db6_figure11.gif

Figure 11. MST using Prim’s algorithm.

707eb89b-3841-451a-add2-4e06d93e8db6_figure12.gif

Figure 12. MST using Kruskal’s algorithm.

Summary

In this paper, we present CySpanningTree app for Cytoscape 3. CySpanningTree fills an important need for many Cytoscape users and researchers in obtaining spanning trees across different types of networks. CySpanningTree makes effective use of the Cytoscape 3 API in extracting the subnetwork and creating it as a separate network. In the near future, we will be exploring MST based clustering and we are determined to explore more datasets whose spanning tree evaluation is significant.

Software availability

CySpanningTree app can be downloaded from the Cytoscape app store.

Archived source code as at the time of publication

http://dx.doi.org/10.5281/zenodo.1966815

Licence: Lesser GNU Public License 3.0

https://www.gnu.org/licenses/lgpl.html

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 05 Aug 2015
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Shaik F, Bezawada S and Goveas N. CySpanningTree: Minimal Spanning Tree computation in Cytoscape [version 1; peer review: 1 approved, 1 approved with reservations] F1000Research 2015, 4:476 (https://doi.org/10.12688/f1000research.6797.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 05 Aug 2015
Views
20
Cite
Reviewer Report 29 Mar 2016
Ankush Sharma, I​nstitute of Clinical Physiology, National Research Council, Siena, Italy;  LISM, Institute of Clinical Physiology, Siena, Italy;  Faculty of Information Technology, United Arab Emirates University, Al-Ain, United Arab Emirates 
Approved with Reservations
VIEWS 20
In this research article entitled -"CySpanningTree: Minimal Spanning Tree computation in Cytoscape, the authors describe the app for Cytoscape version 3 that creates minimal/maximal spanning tree for a given network using network Prim’s and Kruskal’s algorithms.The CySpanningTree app appears to be ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Sharma A. Reviewer Report For: CySpanningTree: Minimal Spanning Tree computation in Cytoscape [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2015, 4:476 (https://doi.org/10.5256/f1000research.7304.r12115)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
36
Cite
Reviewer Report 14 Sep 2015
Shaillay Dogra, Vishuo BioMedical Pte Ltd, Singapore, Singapore 
Approved
VIEWS 36
The authors have come up with a useful plug-in for cytoscape. Different algorithms have been implemented to reduced a cluttered network to a more meaningful one. Such efforts are welcome and potentially useful especially for those working in network analysis ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Dogra S. Reviewer Report For: CySpanningTree: Minimal Spanning Tree computation in Cytoscape [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2015, 4:476 (https://doi.org/10.5256/f1000research.7304.r10256)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 05 Aug 2015
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.