Machine learning model for predicting Major Depressive Disorder using RNA-Seq data: optimization of classification approach

Verma, Pragya; Shakya, Madhvi

doi:10.1007/s11571-021-09724-8

Machine learning model for predicting Major Depressive Disorder using RNA-Seq data: optimization of classification approach

Research Article
Published: 22 September 2021

Volume 16, pages 443–453, (2022)
Cite this article

Cognitive Neurodynamics Aims and scope Submit manuscript

Pragya Verma¹ &
Madhvi Shakya¹

1147 Accesses
8 Citations
2 Altmetric
Explore all metrics

Abstract

Considering human brain disorders, Major Depressive Disorder (MDD) is seen as a lethal disease in which a person goes to the extent of suicidal behavior. Physical detection of MDD patients is less precise but machine learning can aid in improved classification of disease. The present research included three RNA-seq data classes to classify DEGs and then train key gene data using a random forest machine learning method. The three classes in the sample are 29 CON (sudden death healthy control), 21 MDD-S (a Major Depressive Disorder Suicide) being included in the second group, and 9 MDD (non-suicides MDD) which are included in the third group. With PCA analysis, 99 key genes were obtained. 47.1% data variability is given by these 99 genes. The model training of 99 genes indicated improved classification. The RF classification model has an accuracy of 61.11% over test data and 97.56% over train data. It was also noticed that the RF method offered greater accuracy than the KNN method. 99 genes were annotated using DAVID and ClueGo packages. Some of the important pathways and function observed in the study were glutamatergic synapse, GABA receptor activation, long-term synaptic depression, and morphine addiction. Out Of 99 genes, four genes, namely DLGAP1, GNG2, GRIA1, and GRIA4, were found to be predominantly involved in the glutamatergic synapse pathway. Another substantial link was observed in the GABA receptor activation involving the following two genes, GABBR2 and GNG2. Also, the genes found responsible for long-term synaptic depression were GRIA1, MAPT, and PTEN. There was another finding of morphine addiction which comprises three genes, namely GABBR2, GNG2, and PDE4D. For massive datasets, this approach will act as the gold standard. The cases of CON, MDD, and MDD-S are physically distinct. There was dysregulation in the expression level of 12 genes. The 12 genes act as a possible biomarker for Major Depressive Disorder and open up a new path for depressed subjects to explore further.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identification of novel targets and pathways to distinguish suicide dependent or independent on depression diagnosis

Article Open access 13 February 2023

A diagnostic model based on bioinformatics and machine learning to differentiate bipolar disorder from schizophrenia and major depressive disorder

Article Open access 14 February 2024

Downregulated NPAS4 in multiple brain regions is associated with major depressive disorder

Article Open access 07 December 2023

Data availability

Authors declares no data available.

References

Akter S, Xu D, Nagel SC, Bromfield JJ, Pelch K, Wilshire GB, Joshi T (2019) Machine learning classifiers for endometriosis using Transcriptomics and Methylomics data. Front Genet 10:766
Article CAS PubMed PubMed Central Google Scholar
American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders (DSM-5®) American Psychiatric Pub.
Association, A. P. (2013). Diagnostic and statistical manual of mental disorders (DSM-5®): American Psychiatric Pub.
Bhatia, N. (2010). Survey of nearest neighbor techniques. arXiv preprint http://arxiv.org/pdf/1007.0085.
Biau G (2012) Analysis of a random forests model. J Mach Learn Res 13(1):1063–1095
Google Scholar
Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Galon J (2009) ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093
Article CAS PubMed PubMed Central Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Comstock GW, Helsing KJ (1977) Symptoms of depression in two communities. Psychol Med 6(4):551–563
Article Google Scholar
Fakhoury M (2015) New insights into the neurobiological mechanisms of major depressive disorders. Gen Hosp Psychiatry 37(2):172–177
Article PubMed Google Scholar
Fekadu N, Shibeshi W, Engidawork E (2017) Major depressive disorder: pathophysiology and clinical management. J Depress Anxiety 6(1):255–257
Article Google Scholar
Gamez W, Watson D, Doebbeling BN (2007) Abnormal personality and the mood and anxiety disorders: implications for structural models of anxiety and depression. J Anxiety Disord 21(4):526–539
Article PubMed Google Scholar
Gaudillo J, Rodriguez JJR, Nazareno A, Baltazar LR, Vilela J, Bulalacao R, Albia J (2019) Machine learning approach to single nucleotide polymorphism-based asthma prediction. PloS one 14(12):e0225574
Article CAS PubMed PubMed Central Google Scholar
Helgason, T. (1964). Epidemiology of mental disorders in iceland. A psychiatric and demographic investigation of 5395 icelanders. Acta Psychiatrica Scandinavica, 40, SUPPL 173: 171+-171+.
Jabeen A, Ahmad N, Raza K (2018) Machine learning-based state-of-the-art methods for the classification of rna-seq data In Classification in BioApps. Springer, Cham
Google Scholar
Karthik S, Sudha M (2020) Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network. Evol Intell 14:1–16
Google Scholar
Kessler RC, Barker PR, Colpe LJ, Epstein JF, Gfroerer JC, Hiripi E, Zaslavsky AM (2003) Screening for serious mental illness in the general population. Arch Gen Psychiatry 60(2):184–189
Article PubMed Google Scholar
Kessler RC, van Loo HM, Wardenaar KJ, Bossarte RM, Brenner LA, Cai T, Nierenberg AA (2016) Testing a machine-learning algorithm to predict the persistence and severity of major depressive disorder from baseline self-reports. Mol Psychiatry 21(10):1366–1371
Article CAS PubMed PubMed Central Google Scholar
Kuhn M, Johnson K (2013) Applied predictive modeling, vol 26. Springer, New York
Book Google Scholar
Kumari E, Shang Y, Cheng Z, Zhang T (2019) U1 snRNA over-expression affects neural oscillations and short-term memory deficits in mice. Cogn Neurodyn 13(4):313–323
Article PubMed PubMed Central Google Scholar
Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Irizarry RA (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11(10):733–739
Article CAS PubMed Google Scholar
Lin TY (1953) A study of incidence of mental disorders in Chinese and other cultures. Psychiatry 16:315–335
Article Google Scholar
Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJ (2006) Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet 367(9524):1747–1757
Article PubMed Google Scholar
Murray CJ, Vos T, Lozano R, Naghavi M, Flaxman AD, Michaud C, Aboyans V (2012) Disability-adjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 1990–2010: a systematic analysis for the global burden of disease study 2010. The Lancet 380(9859):2197–2223
Article Google Scholar
Navot, A., Shpigelman, L., Tishby, N., &Vaadia, E. (2006). Nearest neighbor based feature selection for regression and its application to neural activity. In Advances in neural information processing systems (pp. 996–1002
Niciu MJ, Ionescu DF, Richards EM, Zarate CA (2014) Glutamate and its receptors in the pathophysiology and treatment of major depressive disorder. J Neural Transm 121(8):907–924
Article CAS PubMed Google Scholar
Papiez A, Marczyk M, Polanska J, Polanski A (2019) BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm. Bioinformatics 35(11):1885–1892
Article CAS PubMed Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Vanderplas J (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Google Scholar
Piles M, Fernandez-Lozano C, Velasco-Galilea M, González-Rodríguez O, Sánchez JP, Torrallardona D, Quintanilla R (2019) Machine learning applied to transcriptomic data to identify genes associated with feed efficiency in pigs. Genet Sel Evol 51(1):10
Article PubMed PubMed Central Google Scholar
Raschka S, &Mirjalili V (2019). Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing Ltd.
Reese SE, Archer KJ, Therneau TM, Atkinson EJ, Vachon CM, De Andrade M, Eckel-Passow JE (2013) A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics 29(22):2877–2883
Article CAS PubMed PubMed Central Google Scholar
Reese, S. (2013). Detecting and Correcting Batch Effects in High-Throughput Genomic Experiments.
Robins LN, Helzer JE, Croughan J, Ratcliff KS (1981) National Institute of Mental Health diagnostic interview schedule: Its history, characteristics, and validity. Arch Gen Psychiatry 38(4):381–389
Article CAS PubMed Google Scholar
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104
Article Google Scholar
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article CAS PubMed Google Scholar
Sayad S (2010) K nearest neighbors. University of Toronto, Toronto
Google Scholar
Schmidt J, Marques MR, Botti S, Marques MA (2019) Recent advances and applications of machine learning in solid-state materials science. NPJ Comput Mater 5(1):1–36
Article Google Scholar
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504
CAS PubMed PubMed Central Google Scholar
Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8(1):25
Article PubMed PubMed Central Google Scholar
Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinformatics 9(1):307
Article PubMed PubMed Central Google Scholar
Sundararaj V (2016) An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. Int J Intell Eng Syst 9(3):117–126
Google Scholar
Sundararaj V (2019) Optimised denoising scheme via opposition-based self-adaptive learning PSO algorithm for wavelet-based ECG signal noise reduction. Int J Biomed Eng Technol 31(4):325–345
Article Google Scholar
Sundararaj V, Selvi M (2021) Opposition grasshopper optimizer based multimedia data distribution using user evaluation strategy. Multim Tools Appl 19:1–17
Google Scholar
Tarai S, Mukherjee R, Gupta S, Rizvanov AA, Palotás A, Pammi VC, Bit A (2019) Influence of pharmacological and epigenetic factors to suppress neurotrophic factors and enhance neural plasticity in stress and mood disorders. Cogn Neurodyn 13:1–19
Article Google Scholar
Tremblay LK, Naranjo CA, Cardenas L, Herrmann N, Busto UE (2002) Probing brain reward system function in major depressive disorder: altered response to dextroamphetamine. Arch Gen Psychiatry 59(5):409–416
Article PubMed Google Scholar
Warnat-Herresthal S, Perrakis K, Taschler B, Becker M, Baßler K, Beyer M, Ulas T (2020) Scalable prediction of acute myeloid leukemia using high-dimensional machine learning and blood transcriptomics. Iscience 23(1):100780
Article CAS PubMed Google Scholar
Yang H, Harrington CA, Vartanian K, Coldren CD, Hall R, Churchill GA (2008) Randomization in laboratory procedure is key to obtaining reproducible microarray results. PloS one 3(11):e3724
Article PubMed PubMed Central Google Scholar
Zararsiz G, Goksuluk D, Korkmaz S, Eldem V, Duru IP, Unver T, & Ozturk A (2014). Classification of RNA-Seq data via bagging support vector machines. bioRxiv, 007526.
Zarate CA, Singh JB, Carlson PJ, Brutsche NE, Ameli R, Luckenbaugh DA, Manji HK (2006) A randomized trial of an N-methyl-D-aspartate antagonist in treatment-resistant major depression. Arch Gen Psychiatry 63(8):856–864
Article CAS PubMed Google Scholar
Zarate CA Jr, Mathews D, Ibrahim L, Chaves JF, Marquardt C, Ukoh I, Luckenbaugh DA (2013) A randomized trial of a low-trapping nonselective N-methyl-D-aspartate channel blocker in major depression. Biol Psychiat 74(4):257–264
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Bioinformatics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, 462003, India
Pragya Verma & Madhvi Shakya

Authors

Pragya Verma
View author publications
You can also search for this author in PubMed Google Scholar
Madhvi Shakya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pragya Verma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verma, P., Shakya, M. Machine learning model for predicting Major Depressive Disorder using RNA-Seq data: optimization of classification approach. Cogn Neurodyn 16, 443–453 (2022). https://doi.org/10.1007/s11571-021-09724-8

Download citation

Received: 12 June 2021
Revised: 28 August 2021
Accepted: 12 September 2021
Published: 22 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11571-021-09724-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning model for predicting Major Depressive Disorder using RNA-Seq data: optimization of classification approach

Abstract

Access this article

Similar content being viewed by others

Identification of novel targets and pathways to distinguish suicide dependent or independent on depression diagnosis

A diagnostic model based on bioinformatics and machine learning to differentiate bipolar disorder from schizophrenia and major depressive disorder

Downregulated NPAS4 in multiple brain regions is associated with major depressive disorder

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine learning model for predicting Major Depressive Disorder using RNA-Seq data: optimization of classification approach

Abstract

Access this article

Similar content being viewed by others

Identification of novel targets and pathways to distinguish suicide dependent or independent on depression diagnosis

A diagnostic model based on bioinformatics and machine learning to differentiate bipolar disorder from schizophrenia and major depressive disorder

Downregulated NPAS4 in multiple brain regions is associated with major depressive disorder

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation