Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression

Shao-Wu Zhang; Dong-Dong Shao; Song-Yao Zhang; Yi-Bin Wang

doi:10.1039/C3MB70588A

Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression†

Shao-Wu Zhang,*^a Dong-Dong Shao,^a Song-Yao Zhang^a and Yi-Bin Wang^a

Author affiliations

* Corresponding authors

^a College of Automation, Northwestern Polytechnical University, Xi'an, China
E-mail: zhangsw@nwpu.edu.cn

Abstract

The identification of disease genes is very important not only to provide greater understanding of gene function and cellular mechanisms which drive human disease, but also to enhance human disease diagnosis and treatment. Recently, high-throughput techniques have been applied to detect dozens or even hundreds of candidate genes. However, experimental approaches to validate the many candidates are usually time-consuming, tedious and expensive, and sometimes lack reproducibility. Therefore, numerous theoretical and computational methods (e.g. network-based approaches) have been developed to prioritize candidate disease genes. Many network-based approaches implicitly utilize the observation that genes causing the same or similar diseases tend to correlate with each other in gene–protein relationship networks. Of these network approaches, the random walk with restart algorithm (RWR) is considered to be a state-of-the-art approach. To further improve the performance of RWR, we propose a novel method named ESFSC to identify disease-related genes, by enlarging the seed set according to the centrality of disease genes in a network and fusing information of the protein–protein interaction (PPI) network topological similarity and the gene expression correlation. The ESFSC algorithm restarts at all of the nodes in the seed set consisting of the known disease genes and their k-nearest neighbor nodes, then walks in the global network separately guided by the similarity transition matrix constructed with PPI network topological similarity properties and the correlational transition matrix constructed with the gene expression profiles. As a result, all the genes in the network are ranked by weighted fusing the above results of the RWR guided by two types of transition matrices. Comprehensive simulation results of the 10 diseases with 97 known disease genes collected from the Online Mendelian Inheritance in Man (OMIM) database show that ESFSC outperforms existing methods for prioritizing candidate disease genes. The top prediction results of Alzheimer's disease are consistent with previous literature reports.

Supplementary files

Article information

DOI: https://doi.org/10.1039/C3MB70588A
Article type: Paper
Submitted: 13 Dec 2013
Accepted: 08 Feb 2014
First published: 10 Feb 2014

Download Citation

Mol. BioSyst., 2014,10, 1400-1408

Permissions

Request permissions

Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression

S. Zhang, D. Shao, S. Zhang and Y. Wang, Mol. BioSyst., 2014, 10, 1400 DOI: 10.1039/C3MB70588A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Molecular BioSystems

Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression

Search articles by author

Spotlight

Advertisements