Issue 5, 2021

Genetic variant effect prediction by supervised nonnegative matrix tri-factorization

Abstract

Discriminating between deleterious and neutral mutations among numerous non-synonymous single nucleotide variants (nsSNVs) that may be observed through whole exome sequencing (WES) is considered a great challenge. In this regard, many machine learning methods have been developed for the prediction of variant consequences based on the analysis of either protein amino acid sequences or protein structures or their integration with features extracted from various gene level data and phenotype information. Due to the availability of a high number of features and heterogeneity of sources, implementing a suitable integration method plays an important role in predictive models. In this study, we proposed a novel supervised nonnegative matrix tri-factorization (sNMTF) algorithm to integrate current variant prediction scores into the gene level data and disease networks. In this regard, a new feature space was constructed by the integration of all input data using sNMTF to provide appropriate inputs for training a classifier. For the assessment of the proposed model, we utilized two benchmark datasets. The first one contained 11 207 deleterious and 19 839 neutral nsSNPs, whereas for the other dataset we used 4416 and 4960 deleterious and neutral nsSNPs, respectively. In general, the evaluation of our proposed supervised NMTF method on both datasets indicated that, in comparison with the existing nsSNV effect prediction approaches, regardless of whether they are ensemble-based or not, our method exhibited a better performance, which resulted in a higher prediction accuracy on average of 15% than other ensemble scores. In addition, excluding any kind of data that were integrated into the final model led to a substantial decrease in deleterious variant prediction. The proposed model can be used as an extensible framework for integrating more hetergeneous sources.

Graphical abstract: Genetic variant effect prediction by supervised nonnegative matrix tri-factorization

Article information

Article type
Research Article
Submitted
08 Feb 2021
Accepted
19 May 2021
First published
24 Jun 2021

Mol. Omics, 2021,17, 740-751

Genetic variant effect prediction by supervised nonnegative matrix tri-factorization

A. A. Arani, M. Sehhati and M. A. Tabatabaiefar, Mol. Omics, 2021, 17, 740 DOI: 10.1039/D1MO00038A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements