Skip to main content

Advertisement

Log in

Machine learning model for predicting Major Depressive Disorder using RNA-Seq data: optimization of classification approach

  • Research Article
  • Published:
Cognitive Neurodynamics Aims and scope Submit manuscript

Abstract

Considering human brain disorders, Major Depressive Disorder (MDD) is seen as a lethal disease in which a person goes to the extent of suicidal behavior. Physical detection of MDD patients is less precise but machine learning can aid in improved classification of disease. The present research included three RNA-seq data classes to classify DEGs and then train key gene data using a random forest machine learning method. The three classes in the sample are 29 CON (sudden death healthy control), 21 MDD-S (a Major Depressive Disorder Suicide) being included in the second group, and 9 MDD (non-suicides MDD) which are included in the third group. With PCA analysis, 99 key genes were obtained. 47.1% data variability is given by these 99 genes. The model training of 99 genes indicated improved classification. The RF classification model has an accuracy of 61.11% over test data and 97.56% over train data. It was also noticed that the RF method offered greater accuracy than the KNN method. 99 genes were annotated using DAVID and ClueGo packages. Some of the important pathways and function observed in the study were glutamatergic synapse, GABA receptor activation, long-term synaptic depression, and morphine addiction. Out Of 99 genes, four genes, namely DLGAP1, GNG2, GRIA1, and GRIA4, were found to be predominantly involved in the glutamatergic synapse pathway. Another substantial link was observed in the GABA receptor activation involving the following two genes, GABBR2 and GNG2. Also, the genes found responsible for long-term synaptic depression were GRIA1, MAPT, and PTEN. There was another finding of morphine addiction which comprises three genes, namely GABBR2, GNG2, and PDE4D. For massive datasets, this approach will act as the gold standard. The cases of CON, MDD, and MDD-S are physically distinct. There was dysregulation in the expression level of 12 genes. The 12 genes act as a possible biomarker for Major Depressive Disorder and open up a new path for depressed subjects to explore further.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

Authors declares no data available.

References

  • Akter S, Xu D, Nagel SC, Bromfield JJ, Pelch K, Wilshire GB, Joshi T (2019) Machine learning classifiers for endometriosis using Transcriptomics and Methylomics data. Front Genet 10:766

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders (DSM-5®) American Psychiatric Pub.

  • Association, A. P. (2013). Diagnostic and statistical manual of mental disorders (DSM-5®): American Psychiatric Pub.

  • Bhatia, N. (2010). Survey of nearest neighbor techniques. arXiv preprint http://arxiv.org/pdf/1007.0085.

  • Biau G (2012) Analysis of a random forests model. J Mach Learn Res 13(1):1063–1095

    Google Scholar 

  • Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Galon J (2009) ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Comstock GW, Helsing KJ (1977) Symptoms of depression in two communities. Psychol Med 6(4):551–563

    Article  Google Scholar 

  • Fakhoury M (2015) New insights into the neurobiological mechanisms of major depressive disorders. Gen Hosp Psychiatry 37(2):172–177

    Article  PubMed  Google Scholar 

  • Fekadu N, Shibeshi W, Engidawork E (2017) Major depressive disorder: pathophysiology and clinical management. J Depress Anxiety 6(1):255–257

    Article  Google Scholar 

  • Gamez W, Watson D, Doebbeling BN (2007) Abnormal personality and the mood and anxiety disorders: implications for structural models of anxiety and depression. J Anxiety Disord 21(4):526–539

    Article  PubMed  Google Scholar 

  • Gaudillo J, Rodriguez JJR, Nazareno A, Baltazar LR, Vilela J, Bulalacao R, Albia J (2019) Machine learning approach to single nucleotide polymorphism-based asthma prediction. PloS one 14(12):e0225574

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Helgason, T. (1964). Epidemiology of mental disorders in iceland. A psychiatric and demographic investigation of 5395 icelanders. Acta Psychiatrica Scandinavica, 40, SUPPL 173: 171+-171+.

  • Jabeen A, Ahmad N, Raza K (2018) Machine learning-based state-of-the-art methods for the classification of rna-seq data In Classification in BioApps. Springer, Cham

    Google Scholar 

  • Karthik S, Sudha M (2020) Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network. Evol Intell 14:1–16

    Google Scholar 

  • Kessler RC, Barker PR, Colpe LJ, Epstein JF, Gfroerer JC, Hiripi E, Zaslavsky AM (2003) Screening for serious mental illness in the general population. Arch Gen Psychiatry 60(2):184–189

    Article  PubMed  Google Scholar 

  • Kessler RC, van Loo HM, Wardenaar KJ, Bossarte RM, Brenner LA, Cai T, Nierenberg AA (2016) Testing a machine-learning algorithm to predict the persistence and severity of major depressive disorder from baseline self-reports. Mol Psychiatry 21(10):1366–1371

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kuhn M, Johnson K (2013) Applied predictive modeling, vol 26. Springer, New York

    Book  Google Scholar 

  • Kumari E, Shang Y, Cheng Z, Zhang T (2019) U1 snRNA over-expression affects neural oscillations and short-term memory deficits in mice. Cogn Neurodyn 13(4):313–323

    Article  PubMed  PubMed Central  Google Scholar 

  • Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Irizarry RA (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11(10):733–739

    Article  CAS  PubMed  Google Scholar 

  • Lin TY (1953) A study of incidence of mental disorders in Chinese and other cultures. Psychiatry 16:315–335

    Article  Google Scholar 

  • Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJ (2006) Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet 367(9524):1747–1757

    Article  PubMed  Google Scholar 

  • Murray CJ, Vos T, Lozano R, Naghavi M, Flaxman AD, Michaud C, Aboyans V (2012) Disability-adjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 1990–2010: a systematic analysis for the global burden of disease study 2010. The Lancet 380(9859):2197–2223

    Article  Google Scholar 

  • Navot, A., Shpigelman, L., Tishby, N., &Vaadia, E. (2006). Nearest neighbor based feature selection for regression and its application to neural activity. In Advances in neural information processing systems (pp. 996–1002

  • Niciu MJ, Ionescu DF, Richards EM, Zarate CA (2014) Glutamate and its receptors in the pathophysiology and treatment of major depressive disorder. J Neural Transm 121(8):907–924

    Article  CAS  PubMed  Google Scholar 

  • Papiez A, Marczyk M, Polanska J, Polanski A (2019) BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm. Bioinformatics 35(11):1885–1892

    Article  CAS  PubMed  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Vanderplas J (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  • Piles M, Fernandez-Lozano C, Velasco-Galilea M, González-Rodríguez O, Sánchez JP, Torrallardona D, Quintanilla R (2019) Machine learning applied to transcriptomic data to identify genes associated with feed efficiency in pigs. Genet Sel Evol 51(1):10

    Article  PubMed  PubMed Central  Google Scholar 

  • Raschka S, &Mirjalili V (2019). Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing Ltd.

  • Reese SE, Archer KJ, Therneau TM, Atkinson EJ, Vachon CM, De Andrade M, Eckel-Passow JE (2013) A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics 29(22):2877–2883

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Reese, S. (2013). Detecting and Correcting Batch Effects in High-Throughput Genomic Experiments.

  • Robins LN, Helzer JE, Croughan J, Ratcliff KS (1981) National Institute of Mental Health diagnostic interview schedule: Its history, characteristics, and validity. Arch Gen Psychiatry 38(4):381–389

    Article  CAS  PubMed  Google Scholar 

  • Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104

    Article  Google Scholar 

  • Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  CAS  PubMed  Google Scholar 

  • Sayad S (2010) K nearest neighbors. University of Toronto, Toronto

    Google Scholar 

  • Schmidt J, Marques MR, Botti S, Marques MA (2019) Recent advances and applications of machine learning in solid-state materials science. NPJ Comput Mater 5(1):1–36

    Article  Google Scholar 

  • Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504

    CAS  PubMed  PubMed Central  Google Scholar 

  • Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8(1):25

    Article  PubMed  PubMed Central  Google Scholar 

  • Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinformatics 9(1):307

    Article  PubMed  PubMed Central  Google Scholar 

  • Sundararaj V (2016) An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. Int J Intell Eng Syst 9(3):117–126

    Google Scholar 

  • Sundararaj V (2019) Optimised denoising scheme via opposition-based self-adaptive learning PSO algorithm for wavelet-based ECG signal noise reduction. Int J Biomed Eng Technol 31(4):325–345

    Article  Google Scholar 

  • Sundararaj V, Selvi M (2021) Opposition grasshopper optimizer based multimedia data distribution using user evaluation strategy. Multim Tools Appl 19:1–17

    Google Scholar 

  • Tarai S, Mukherjee R, Gupta S, Rizvanov AA, Palotás A, Pammi VC, Bit A (2019) Influence of pharmacological and epigenetic factors to suppress neurotrophic factors and enhance neural plasticity in stress and mood disorders. Cogn Neurodyn 13:1–19

    Article  Google Scholar 

  • Tremblay LK, Naranjo CA, Cardenas L, Herrmann N, Busto UE (2002) Probing brain reward system function in major depressive disorder: altered response to dextroamphetamine. Arch Gen Psychiatry 59(5):409–416

    Article  PubMed  Google Scholar 

  • Warnat-Herresthal S, Perrakis K, Taschler B, Becker M, Baßler K, Beyer M, Ulas T (2020) Scalable prediction of acute myeloid leukemia using high-dimensional machine learning and blood transcriptomics. Iscience 23(1):100780

    Article  CAS  PubMed  Google Scholar 

  • Yang H, Harrington CA, Vartanian K, Coldren CD, Hall R, Churchill GA (2008) Randomization in laboratory procedure is key to obtaining reproducible microarray results. PloS one 3(11):e3724

    Article  PubMed  PubMed Central  Google Scholar 

  • Zararsiz G, Goksuluk D, Korkmaz S, Eldem V, Duru IP, Unver T, & Ozturk A (2014). Classification of RNA-Seq data via bagging support vector machines. bioRxiv, 007526.

  • Zarate CA, Singh JB, Carlson PJ, Brutsche NE, Ameli R, Luckenbaugh DA, Manji HK (2006) A randomized trial of an N-methyl-D-aspartate antagonist in treatment-resistant major depression. Arch Gen Psychiatry 63(8):856–864

    Article  CAS  PubMed  Google Scholar 

  • Zarate CA Jr, Mathews D, Ibrahim L, Chaves JF, Marquardt C, Ukoh I, Luckenbaugh DA (2013) A randomized trial of a low-trapping nonselective N-methyl-D-aspartate channel blocker in major depression. Biol Psychiat 74(4):257–264

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pragya Verma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Verma, P., Shakya, M. Machine learning model for predicting Major Depressive Disorder using RNA-Seq data: optimization of classification approach. Cogn Neurodyn 16, 443–453 (2022). https://doi.org/10.1007/s11571-021-09724-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11571-021-09724-8

Keywords

Navigation