Smart voice recognition based on deep learning for depression diagnosis

Suparatpinyo, Sukit; Soonthornphisaj, Nuanwan

doi:10.1007/s10015-023-00852-4

Smart voice recognition based on deep learning for depression diagnosis

Original Article
Published: 19 January 2023

Volume 28, pages 332–342, (2023)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Sukit Suparatpinyo¹ &
Nuanwan Soonthornphisaj¹

731 Accesses
3 Citations
Explore all metrics

Abstract

Depressive disorder is a kind of mental illness with a high incidence rate due to the stress from the environment or social impact. Depression affects mood and behavior that leads to various problem domains such as education, family, and workplace problems. Suicide attempt is found in severe depression cases as well. However, depression is a treatable condition if diagnosed by psychiatrists. In Thailand, many people who aware of mental disorders do not seek help from psychiatric hospitals due to long waiting services and high fees. Therefore, we aim to create an application for users to do self-assessment by collecting their voice signal data. In our experiment, we define the voice data obtained from the depressive patient during a therapy session in a psychiatric hospital as positive class. The negative class is the voice data of non-depressive people obtained from the interview session with university students. Each audio file has been rendered into spectrograph. The spectrograph is a visual representation of power spectrum. A power spectrum is the Mel frequency-spaced cepstral coefficients (MFCCs) extracted from the human voice that changes over time using fast Fourier transform and discrete cosine transform (DCT) algorithms. Since some research claimed that DCT causes some spectral features to be loss, we do empirical studies between applied DCT and non- DCT spectrographs set. Moreover some research studies stated that larger window provides more detail of speech activity on power spectrum which affected to the performance of depressive detection, so we explore Blackman-Harris and Blackman window functions to create different set of spectrographs to prove that idea on Thai speech dataset. Deep learning models based on the deep residual network (ResNet) are explored to see its potential on classification. Different numbers of convolution layers such as ResNet-34, ResNet-50, and ResNet-101 are examined, respectively. The experimental results show that both trained ResNet-50 model from different type of spectrograph can achieve higher than 70% of F1-Score which is the best performance above other approaches. We found that the model learning from spectrograph extracted by Blackman window function with non-DCT algorithm provides the best sensitivity at 74.45% showing. To the best of our knowledge, our approach gives the highest F1-score when compared to the state of the art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Data availability statement

Raw data that support the findings of this study are available from the first author, upon a reasonable request.

References

Bufferd SJ, Dougherty LR, Carlson GA, Klein DN (2011) Parent reported mental health in preschoolers: findings using a diagnostic interview. Compr Psychiatry 52(4):359–369
Article Google Scholar
Lotrakul P, Meeroslam P, Wichai S (1998) Abnormal psychosocial situations in children and adolescents attending child mental health center. J Psychiatr Assoc Thail 43(3):226–239
Google Scholar
Arin N (2015) Psychological distress and attitudes toward seeking professional psychological help among university students. J Clin Psychol Thail 46(1):16–29
Google Scholar
Gould MS, King R, Greenwald S, Flisher AJ, Goodman S, Canino G, Shaffer D (1998) Psychopathology associated with suicidal ideation and attempts among children and adolescents. J Am Acad Child Adolesc Psychiatry 37(9):915–923
Article Google Scholar
Easden MH, Fletcher RB (2018) Therapist competence in case conceptualization and outcome in CBT for depression. J Psychother Res 20(2):151–169
Google Scholar
Wang J, Zhang L, Liu T, Pan W, Hu B, Zhu T (2019) Acoustic differences between healthy and depressed people: a cross-situation study. BMC Psychiatry 19:300
Article Google Scholar
Alpert M, Pouget ER, Silva RR (2001) Reflections of depression in acoustic measures of the patient’s speech. J Affect Disord 66:59–69
Article Google Scholar
Chaisan A, Sukahuk R (2013) Emotional classification from Thai text message using machine learning technique. In: The 9th National Conference on Computing and Information Technology, 9–10 May 2013, pp 260–266
Sarakit P. Classifying emotion in Thai youtube comments. In: International Conference of Information and Communication Technology for Embedded Systems, 6th IC-ICTES, IEEE, 1–5
Chansky TE, Kendall PC (1997) Social expectancies and self-perceptions in anxiety-disordered children. J Anxiety Disord 11(4):347–363
Article Google Scholar
Compton SN, Burns BJ, Helen LE, Robertson E (2002) Review of the evidence base for treatment of childhood psychopathology: internalizing disorders. J Consult Clin Psychol 70(6):1240–1266
Article Google Scholar
Deshmukh O, Espy-Wilson C, Salomon A, Singh J (2005) Detection of periodicity and aperiodicity in speech signal based on temporal information. IEEE Trans Speech Audio Process 13:5
Article Google Scholar
Cannizzaro M, Harel B, Reilly N, Chappell P, Snyder PJ (2004) Voice acoustical measurement of the severity of the major depression. Brain Cogn 56(1):30–35
Article Google Scholar
Mundt JC, Vogel AP, Feltner DE, Lenderking WR (2012) Vocal acosutic biomakers of depression serverity and treatment response. Biol Psychiat 72(7):580–587
Article Google Scholar
Richmond K (2002) Estimating articulatory parameters from the acoustic speech signal. University of Edinburgh
Google Scholar
Quatieri TF, Malyska N (2012) Vocal-source biomarkers for depression: a link to psychomotor activity. In: Proceedings of the 13th Annual Conference of the International Speech Communication Association 2012 (Portland, Oregon, USA, 2012). INTERSPEECH 2012
Ooi Brian KE, Lech M, Aleen BN (2014) Prediction of major depression in adolescents using an optimized multi-channel weighted speech classification system. Biomed Signal Process Control 14:228–239
Article Google Scholar
He L, Cao C (2018) Automated depression analysis using convolutional neural networks from speech. J Biomed Inform 83:103–111
Article Google Scholar
Liu L, Fieguth P, Pietikainen M, Lao S (2015) Median robust extended local binary pattern for texture classification. IEEE Trans Image Process 25(3):1368–1381. https://doi.org/10.1109/TIP.2016.2522378
Article MathSciNet MATH Google Scholar
Valstar M, Schuller B, Smith K, Eyben F, Jiang B, Bilakhia S, Schniedar S, Cowie R, Pantic M (2013) The continuous audio/visual emotion and depression recognition challenge. In: Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, ACM, pp 41–48
Valstar M, Schuller B, Smith K, Almaev T, Eyben F, Krajewski, Cowie R, Pantic M (2014) 3D dimensional affect and depression recognition challenge. In: Proceedings of 4th ACM International Workshop on Audio/Visual Emotion Challenge, ACM, 3–10
McGinnis RS, McGinnis E, Hruschak J, Lopez- Duran NL, Fitzgerald FK, Rosenblum KL, Muzik M (2019) Rapid detection of internalizing diagnosis in young children enabled by wearable sensors and machine learning. PLoS ONE 14(1):1–16
Article Google Scholar
McGinnis RS, McGinnis E, Hruschak J, Lopez-Duran NL, Fitzgerald K, Rosenblum KL, Muzik M (2018) Wearable sensors and machine learning diagnose anxiety and depression in young children. In: Proceedings of the 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) (Las Vegas, Nevada, USA, 4–7 March 2018, 2018). IEEE
McGinnis EW, Anderau SP, Hruschak J, Gurchiek RD, Lopez-Duran NL, Fitzgerald K, Rosenblum KL, Muzik M, McGinnis RS (2019) Giving voice to vulnerable children: machine learning analysis of speech detects anxiety and depression in early childhood. IEEE J Biomed Health Inform 23(6):2294–2301
Article Google Scholar
Lopez-Duran NL, McGinnis E, Kuhlman K, Geiss E, Vargas I, Mayer S (2015) HPA-axis stress reactivity in youth depression: evidence of impaired regulatory processes in depressed boys. Stress 18(5):545–553
Article Google Scholar
Chlasta K, Wolk K, Krejtz I (2019) Automated speech- based screening of depression using deep convolutional neural networks. In: Proceedings of the CENTERIS - International Conference on Enterprise Information systems/projMAN—International Conference on Project Management/HCist—International Conference on Health and social Care Information Systems and Technologies 2019 (Sousse, Tunisia, 16–18 October 2019, 2019). Procedia Computer Science
Huang Z, Epps J, Joachim D (2020) Exploiting vocal tract coordination using dilated CNNS for depression detection in naturalistic environments. In: Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Barcelona, Spain, 2020). IEEE
Seneviratne N, Williamson JR, Lammert AC, Quatieri TF, Espy-Wilson C (2020) Extended study on the use of vocal tract variables to quantify neuromotor coordination in depression
Rejaibi E, Komaty A, Meriaueau F, Agrebi S, Othmani A (2022) MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:1–11
Article Google Scholar
Purwins H, Li B, Virtanen T, Schluter J, Chang SY, Sainath T (2019) Deep learning for audio signal processing. J Select Top Signal Process 13(2):206–219
Article Google Scholar
Habib M, Faris M, Qaddoura R (2021) Toward an automatic quality assessment of voice-based telemedicine consultations: a deep learning approach. Sensors 21(9):1–26
Article Google Scholar
Zhang Q, Li Z, Hu Y (2021) Aretrieval algorithm for encrypted speech based on convolutional neural network and deep hashing. Multimed Tools Appl 80:1201–1221
Article Google Scholar
Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: a strong baseline. In: Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN) (Anchorage, Alaska, USA, 2017). IEEE
Wu Y (1990) New FFT structures based on the Bruun algorithm. IEEE Trans Acoust Speech Signal Process 38(1):188–191
Article MathSciNet Google Scholar
Bruun G (1978) z-transform DFT filters and FFT’s. Inst Electr Electron Eng. Trans Acoust Speech Signal Process 26(1):56–63
Google Scholar
Wang YAZG (2014) Compressed wideband spectrum sensing based on discrete cosine transform. Sci World J 2014:1–5
Google Scholar
Verdet F (2011) Exploring variabilities through factor analysis in automatic acoustic language recognition. University of Fribourg, Université d’Avignon et des Pays du Vaucluse, Avignon, France
Kadiri S, Kethireddy R, Alku P (2020) Parkinson’s disease detection from speech using single frequency filtering cepstral coefficients. In: Proceedings of the Interspeech (Shanghai, China, 2020). Interspeech
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR) (Las Vegas, Nevada, USA, 27–30 June 2016, 2016). IEEE

Download references

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Science, Kasetsart University, Bangkok, Thailand
Sukit Suparatpinyo & Nuanwan Soonthornphisaj

Authors

Sukit Suparatpinyo
View author publications
You can also search for this author in PubMed Google Scholar
Nuanwan Soonthornphisaj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nuanwan Soonthornphisaj.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Suparatpinyo, S., Soonthornphisaj, N. Smart voice recognition based on deep learning for depression diagnosis. Artif Life Robotics 28, 332–342 (2023). https://doi.org/10.1007/s10015-023-00852-4

Download citation

Received: 13 April 2022
Accepted: 20 December 2022
Published: 19 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10015-023-00852-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Smart voice recognition based on deep learning for depression diagnosis

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Data availability statement

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Keywords

Navigation

Smart voice recognition based on deep learning for depression diagnosis

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Data availability statement

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Share this article

Keywords

Search

Navigation