Skip to main content

Advertisement

Log in

Smart voice recognition based on deep learning for depression diagnosis

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Depressive disorder is a kind of mental illness with a high incidence rate due to the stress from the environment or social impact. Depression affects mood and behavior that leads to various problem domains such as education, family, and workplace problems. Suicide attempt is found in severe depression cases as well. However, depression is a treatable condition if diagnosed by psychiatrists. In Thailand, many people who aware of mental disorders do not seek help from psychiatric hospitals due to long waiting services and high fees. Therefore, we aim to create an application for users to do self-assessment by collecting their voice signal data. In our experiment, we define the voice data obtained from the depressive patient during a therapy session in a psychiatric hospital as positive class. The negative class is the voice data of non-depressive people obtained from the interview session with university students. Each audio file has been rendered into spectrograph. The spectrograph is a visual representation of power spectrum. A power spectrum is the Mel frequency-spaced cepstral coefficients (MFCCs) extracted from the human voice that changes over time using fast Fourier transform and discrete cosine transform (DCT) algorithms. Since some research claimed that DCT causes some spectral features to be loss, we do empirical studies between applied DCT and non- DCT spectrographs set. Moreover some research studies stated that larger window provides more detail of speech activity on power spectrum which affected to the performance of depressive detection, so we explore Blackman-Harris and Blackman window functions to create different set of spectrographs to prove that idea on Thai speech dataset. Deep learning models based on the deep residual network (ResNet) are explored to see its potential on classification. Different numbers of convolution layers such as ResNet-34, ResNet-50, and ResNet-101 are examined, respectively. The experimental results show that both trained ResNet-50 model from different type of spectrograph can achieve higher than 70% of F1-Score which is the best performance above other approaches. We found that the model learning from spectrograph extracted by Blackman window function with non-DCT algorithm provides the best sensitivity at 74.45% showing. To the best of our knowledge, our approach gives the highest F1-score when compared to the state of the art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability statement

Raw data that support the findings of this study are available from the first author, upon a reasonable request.

References

  1. Bufferd SJ, Dougherty LR, Carlson GA, Klein DN (2011) Parent reported mental health in preschoolers: findings using a diagnostic interview. Compr Psychiatry 52(4):359–369

    Article  Google Scholar 

  2. Lotrakul P, Meeroslam P, Wichai S (1998) Abnormal psychosocial situations in children and adolescents attending child mental health center. J Psychiatr Assoc Thail 43(3):226–239

    Google Scholar 

  3. Arin N (2015) Psychological distress and attitudes toward seeking professional psychological help among university students. J Clin Psychol Thail 46(1):16–29

    Google Scholar 

  4. Gould MS, King R, Greenwald S, Flisher AJ, Goodman S, Canino G, Shaffer D (1998) Psychopathology associated with suicidal ideation and attempts among children and adolescents. J Am Acad Child Adolesc Psychiatry 37(9):915–923

    Article  Google Scholar 

  5. Easden MH, Fletcher RB (2018) Therapist competence in case conceptualization and outcome in CBT for depression. J Psychother Res 20(2):151–169

    Google Scholar 

  6. Wang J, Zhang L, Liu T, Pan W, Hu B, Zhu T (2019) Acoustic differences between healthy and depressed people: a cross-situation study. BMC Psychiatry 19:300

    Article  Google Scholar 

  7. Alpert M, Pouget ER, Silva RR (2001) Reflections of depression in acoustic measures of the patient’s speech. J Affect Disord 66:59–69

    Article  Google Scholar 

  8. Chaisan A, Sukahuk R (2013) Emotional classification from Thai text message using machine learning technique. In: The 9th National Conference on Computing and Information Technology, 9–10 May 2013, pp 260–266

  9. Sarakit P. Classifying emotion in Thai youtube comments. In: International Conference of Information and Communication Technology for Embedded Systems, 6th IC-ICTES, IEEE, 1–5

  10. Chansky TE, Kendall PC (1997) Social expectancies and self-perceptions in anxiety-disordered children. J Anxiety Disord 11(4):347–363

    Article  Google Scholar 

  11. Compton SN, Burns BJ, Helen LE, Robertson E (2002) Review of the evidence base for treatment of childhood psychopathology: internalizing disorders. J Consult Clin Psychol 70(6):1240–1266

    Article  Google Scholar 

  12. Deshmukh O, Espy-Wilson C, Salomon A, Singh J (2005) Detection of periodicity and aperiodicity in speech signal based on temporal information. IEEE Trans Speech Audio Process 13:5

    Article  Google Scholar 

  13. Cannizzaro M, Harel B, Reilly N, Chappell P, Snyder PJ (2004) Voice acoustical measurement of the severity of the major depression. Brain Cogn 56(1):30–35

    Article  Google Scholar 

  14. Mundt JC, Vogel AP, Feltner DE, Lenderking WR (2012) Vocal acosutic biomakers of depression serverity and treatment response. Biol Psychiat 72(7):580–587

    Article  Google Scholar 

  15. Richmond K (2002) Estimating articulatory parameters from the acoustic speech signal. University of Edinburgh

    Google Scholar 

  16. Quatieri TF, Malyska N (2012) Vocal-source biomarkers for depression: a link to psychomotor activity. In: Proceedings of the 13th Annual Conference of the International Speech Communication Association 2012 (Portland, Oregon, USA, 2012). INTERSPEECH 2012

  17. Ooi Brian KE, Lech M, Aleen BN (2014) Prediction of major depression in adolescents using an optimized multi-channel weighted speech classification system. Biomed Signal Process Control 14:228–239

    Article  Google Scholar 

  18. He L, Cao C (2018) Automated depression analysis using convolutional neural networks from speech. J Biomed Inform 83:103–111

    Article  Google Scholar 

  19. Liu L, Fieguth P, Pietikainen M, Lao S (2015) Median robust extended local binary pattern for texture classification. IEEE Trans Image Process 25(3):1368–1381. https://doi.org/10.1109/TIP.2016.2522378

    Article  MathSciNet  MATH  Google Scholar 

  20. Valstar M, Schuller B, Smith K, Eyben F, Jiang B, Bilakhia S, Schniedar S, Cowie R, Pantic M (2013) The continuous audio/visual emotion and depression recognition challenge. In: Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, ACM, pp 41–48

  21. Valstar M, Schuller B, Smith K, Almaev T, Eyben F, Krajewski, Cowie R, Pantic M (2014) 3D dimensional affect and depression recognition challenge. In: Proceedings of 4th ACM International Workshop on Audio/Visual Emotion Challenge, ACM, 3–10

  22. McGinnis RS, McGinnis E, Hruschak J, Lopez- Duran NL, Fitzgerald FK, Rosenblum KL, Muzik M (2019) Rapid detection of internalizing diagnosis in young children enabled by wearable sensors and machine learning. PLoS ONE 14(1):1–16

    Article  Google Scholar 

  23. McGinnis RS, McGinnis E, Hruschak J, Lopez-Duran NL, Fitzgerald K, Rosenblum KL, Muzik M (2018) Wearable sensors and machine learning diagnose anxiety and depression in young children. In: Proceedings of the 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) (Las Vegas, Nevada, USA, 4–7 March 2018, 2018). IEEE

  24. McGinnis EW, Anderau SP, Hruschak J, Gurchiek RD, Lopez-Duran NL, Fitzgerald K, Rosenblum KL, Muzik M, McGinnis RS (2019) Giving voice to vulnerable children: machine learning analysis of speech detects anxiety and depression in early childhood. IEEE J Biomed Health Inform 23(6):2294–2301

    Article  Google Scholar 

  25. Lopez-Duran NL, McGinnis E, Kuhlman K, Geiss E, Vargas I, Mayer S (2015) HPA-axis stress reactivity in youth depression: evidence of impaired regulatory processes in depressed boys. Stress 18(5):545–553

    Article  Google Scholar 

  26. Chlasta K, Wolk K, Krejtz I (2019) Automated speech- based screening of depression using deep convolutional neural networks. In: Proceedings of the CENTERIS - International Conference on Enterprise Information systems/projMAN—International Conference on Project Management/HCist—International Conference on Health and social Care Information Systems and Technologies 2019 (Sousse, Tunisia, 16–18 October 2019, 2019). Procedia Computer Science

  27. Huang Z, Epps J, Joachim D (2020) Exploiting vocal tract coordination using dilated CNNS for depression detection in naturalistic environments. In: Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Barcelona, Spain, 2020). IEEE

  28. Seneviratne N, Williamson JR, Lammert AC, Quatieri TF, Espy-Wilson C (2020) Extended study on the use of vocal tract variables to quantify neuromotor coordination in depression

  29. Rejaibi E, Komaty A, Meriaueau F, Agrebi S, Othmani A (2022) MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:1–11

    Article  Google Scholar 

  30. Purwins H, Li B, Virtanen T, Schluter J, Chang SY, Sainath T (2019) Deep learning for audio signal processing. J Select Top Signal Process 13(2):206–219

    Article  Google Scholar 

  31. Habib M, Faris M, Qaddoura R (2021) Toward an automatic quality assessment of voice-based telemedicine consultations: a deep learning approach. Sensors 21(9):1–26

    Article  Google Scholar 

  32. Zhang Q, Li Z, Hu Y (2021) Aretrieval algorithm for encrypted speech based on convolutional neural network and deep hashing. Multimed Tools Appl 80:1201–1221

    Article  Google Scholar 

  33. Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: a strong baseline. In: Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN) (Anchorage, Alaska, USA, 2017). IEEE

  34. Wu Y (1990) New FFT structures based on the Bruun algorithm. IEEE Trans Acoust Speech Signal Process 38(1):188–191

    Article  MathSciNet  Google Scholar 

  35. Bruun G (1978) z-transform DFT filters and FFT’s. Inst Electr Electron Eng. Trans Acoust Speech Signal Process 26(1):56–63

    Google Scholar 

  36. Wang YAZG (2014) Compressed wideband spectrum sensing based on discrete cosine transform. Sci World J 2014:1–5

    Google Scholar 

  37. Verdet F (2011) Exploring variabilities through factor analysis in automatic acoustic language recognition. University of Fribourg, Université d’Avignon et des Pays du Vaucluse, Avignon, France

  38. Kadiri S, Kethireddy R, Alku P (2020) Parkinson’s disease detection from speech using single frequency filtering cepstral coefficients. In: Proceedings of the Interspeech (Shanghai, China, 2020). Interspeech

  39. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR) (Las Vegas, Nevada, USA, 27–30 June 2016, 2016). IEEE

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuanwan Soonthornphisaj.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suparatpinyo, S., Soonthornphisaj, N. Smart voice recognition based on deep learning for depression diagnosis. Artif Life Robotics 28, 332–342 (2023). https://doi.org/10.1007/s10015-023-00852-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-023-00852-4

Keywords

Navigation