Abstract
Considering human brain disorders, Major Depressive Disorder (MDD) is seen as a lethal disease in which a person goes to the extent of suicidal behavior. Physical detection of MDD patients is less precise but machine learning can aid in improved classification of disease. The present research included three RNA-seq data classes to classify DEGs and then train key gene data using a random forest machine learning method. The three classes in the sample are 29 CON (sudden death healthy control), 21 MDD-S (a Major Depressive Disorder Suicide) being included in the second group, and 9 MDD (non-suicides MDD) which are included in the third group. With PCA analysis, 99 key genes were obtained. 47.1% data variability is given by these 99 genes. The model training of 99 genes indicated improved classification. The RF classification model has an accuracy of 61.11% over test data and 97.56% over train data. It was also noticed that the RF method offered greater accuracy than the KNN method. 99 genes were annotated using DAVID and ClueGo packages. Some of the important pathways and function observed in the study were glutamatergic synapse, GABA receptor activation, long-term synaptic depression, and morphine addiction. Out Of 99 genes, four genes, namely DLGAP1, GNG2, GRIA1, and GRIA4, were found to be predominantly involved in the glutamatergic synapse pathway. Another substantial link was observed in the GABA receptor activation involving the following two genes, GABBR2 and GNG2. Also, the genes found responsible for long-term synaptic depression were GRIA1, MAPT, and PTEN. There was another finding of morphine addiction which comprises three genes, namely GABBR2, GNG2, and PDE4D. For massive datasets, this approach will act as the gold standard. The cases of CON, MDD, and MDD-S are physically distinct. There was dysregulation in the expression level of 12 genes. The 12 genes act as a possible biomarker for Major Depressive Disorder and open up a new path for depressed subjects to explore further.
Similar content being viewed by others
Data availability
Authors declares no data available.
References
Akter S, Xu D, Nagel SC, Bromfield JJ, Pelch K, Wilshire GB, Joshi T (2019) Machine learning classifiers for endometriosis using Transcriptomics and Methylomics data. Front Genet 10:766
American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders (DSM-5®) American Psychiatric Pub.
Association, A. P. (2013). Diagnostic and statistical manual of mental disorders (DSM-5®): American Psychiatric Pub.
Bhatia, N. (2010). Survey of nearest neighbor techniques. arXiv preprint http://arxiv.org/pdf/1007.0085.
Biau G (2012) Analysis of a random forests model. J Mach Learn Res 13(1):1063–1095
Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Galon J (2009) ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Comstock GW, Helsing KJ (1977) Symptoms of depression in two communities. Psychol Med 6(4):551–563
Fakhoury M (2015) New insights into the neurobiological mechanisms of major depressive disorders. Gen Hosp Psychiatry 37(2):172–177
Fekadu N, Shibeshi W, Engidawork E (2017) Major depressive disorder: pathophysiology and clinical management. J Depress Anxiety 6(1):255–257
Gamez W, Watson D, Doebbeling BN (2007) Abnormal personality and the mood and anxiety disorders: implications for structural models of anxiety and depression. J Anxiety Disord 21(4):526–539
Gaudillo J, Rodriguez JJR, Nazareno A, Baltazar LR, Vilela J, Bulalacao R, Albia J (2019) Machine learning approach to single nucleotide polymorphism-based asthma prediction. PloS one 14(12):e0225574
Helgason, T. (1964). Epidemiology of mental disorders in iceland. A psychiatric and demographic investigation of 5395 icelanders. Acta Psychiatrica Scandinavica, 40, SUPPL 173: 171+-171+.
Jabeen A, Ahmad N, Raza K (2018) Machine learning-based state-of-the-art methods for the classification of rna-seq data In Classification in BioApps. Springer, Cham
Karthik S, Sudha M (2020) Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network. Evol Intell 14:1–16
Kessler RC, Barker PR, Colpe LJ, Epstein JF, Gfroerer JC, Hiripi E, Zaslavsky AM (2003) Screening for serious mental illness in the general population. Arch Gen Psychiatry 60(2):184–189
Kessler RC, van Loo HM, Wardenaar KJ, Bossarte RM, Brenner LA, Cai T, Nierenberg AA (2016) Testing a machine-learning algorithm to predict the persistence and severity of major depressive disorder from baseline self-reports. Mol Psychiatry 21(10):1366–1371
Kuhn M, Johnson K (2013) Applied predictive modeling, vol 26. Springer, New York
Kumari E, Shang Y, Cheng Z, Zhang T (2019) U1 snRNA over-expression affects neural oscillations and short-term memory deficits in mice. Cogn Neurodyn 13(4):313–323
Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Irizarry RA (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11(10):733–739
Lin TY (1953) A study of incidence of mental disorders in Chinese and other cultures. Psychiatry 16:315–335
Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJ (2006) Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet 367(9524):1747–1757
Murray CJ, Vos T, Lozano R, Naghavi M, Flaxman AD, Michaud C, Aboyans V (2012) Disability-adjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 1990–2010: a systematic analysis for the global burden of disease study 2010. The Lancet 380(9859):2197–2223
Navot, A., Shpigelman, L., Tishby, N., &Vaadia, E. (2006). Nearest neighbor based feature selection for regression and its application to neural activity. In Advances in neural information processing systems (pp. 996–1002
Niciu MJ, Ionescu DF, Richards EM, Zarate CA (2014) Glutamate and its receptors in the pathophysiology and treatment of major depressive disorder. J Neural Transm 121(8):907–924
Papiez A, Marczyk M, Polanska J, Polanski A (2019) BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm. Bioinformatics 35(11):1885–1892
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Vanderplas J (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Piles M, Fernandez-Lozano C, Velasco-Galilea M, González-Rodríguez O, Sánchez JP, Torrallardona D, Quintanilla R (2019) Machine learning applied to transcriptomic data to identify genes associated with feed efficiency in pigs. Genet Sel Evol 51(1):10
Raschka S, &Mirjalili V (2019). Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing Ltd.
Reese SE, Archer KJ, Therneau TM, Atkinson EJ, Vachon CM, De Andrade M, Eckel-Passow JE (2013) A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics 29(22):2877–2883
Reese, S. (2013). Detecting and Correcting Batch Effects in High-Throughput Genomic Experiments.
Robins LN, Helzer JE, Croughan J, Ratcliff KS (1981) National Institute of Mental Health diagnostic interview schedule: Its history, characteristics, and validity. Arch Gen Psychiatry 38(4):381–389
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Sayad S (2010) K nearest neighbors. University of Toronto, Toronto
Schmidt J, Marques MR, Botti S, Marques MA (2019) Recent advances and applications of machine learning in solid-state materials science. NPJ Comput Mater 5(1):1–36
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504
Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8(1):25
Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinformatics 9(1):307
Sundararaj V (2016) An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. Int J Intell Eng Syst 9(3):117–126
Sundararaj V (2019) Optimised denoising scheme via opposition-based self-adaptive learning PSO algorithm for wavelet-based ECG signal noise reduction. Int J Biomed Eng Technol 31(4):325–345
Sundararaj V, Selvi M (2021) Opposition grasshopper optimizer based multimedia data distribution using user evaluation strategy. Multim Tools Appl 19:1–17
Tarai S, Mukherjee R, Gupta S, Rizvanov AA, Palotás A, Pammi VC, Bit A (2019) Influence of pharmacological and epigenetic factors to suppress neurotrophic factors and enhance neural plasticity in stress and mood disorders. Cogn Neurodyn 13:1–19
Tremblay LK, Naranjo CA, Cardenas L, Herrmann N, Busto UE (2002) Probing brain reward system function in major depressive disorder: altered response to dextroamphetamine. Arch Gen Psychiatry 59(5):409–416
Warnat-Herresthal S, Perrakis K, Taschler B, Becker M, Baßler K, Beyer M, Ulas T (2020) Scalable prediction of acute myeloid leukemia using high-dimensional machine learning and blood transcriptomics. Iscience 23(1):100780
Yang H, Harrington CA, Vartanian K, Coldren CD, Hall R, Churchill GA (2008) Randomization in laboratory procedure is key to obtaining reproducible microarray results. PloS one 3(11):e3724
Zararsiz G, Goksuluk D, Korkmaz S, Eldem V, Duru IP, Unver T, & Ozturk A (2014). Classification of RNA-Seq data via bagging support vector machines. bioRxiv, 007526.
Zarate CA, Singh JB, Carlson PJ, Brutsche NE, Ameli R, Luckenbaugh DA, Manji HK (2006) A randomized trial of an N-methyl-D-aspartate antagonist in treatment-resistant major depression. Arch Gen Psychiatry 63(8):856–864
Zarate CA Jr, Mathews D, Ibrahim L, Chaves JF, Marquardt C, Ukoh I, Luckenbaugh DA (2013) A randomized trial of a low-trapping nonselective N-methyl-D-aspartate channel blocker in major depression. Biol Psychiat 74(4):257–264
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Verma, P., Shakya, M. Machine learning model for predicting Major Depressive Disorder using RNA-Seq data: optimization of classification approach. Cogn Neurodyn 16, 443–453 (2022). https://doi.org/10.1007/s11571-021-09724-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11571-021-09724-8