Abstract
The accelerated speed of contemporary life and work raises people’s psychological stress in general, and the prevalence rate of depression has been increasing in recent years. Therefore, effectively preventing and diagnosing depression is becoming a focus of medical study. This paper proposes a model for depression prediction based on BiAttention-GRU (Bimodal Attention and Gate Recurrent Unit) by analyzing text, speech and facial expression features associated with depression. In which PAttention (Parallel Attention) is used to extract essential local features from each modal to reduce the influence of irrelevant information. FAttention (Fusion Attention) is employed to calculate the contribution degree of each model and their fusion features. GRU is utilized to extract the temporal information for the features upper and lower segments. Finally, the Softmax layer is used to achieve depression prediction results. Comparing the proposed approach with the CNN-Attention, GRU-Attention, BiAttention (Bimodel Attention), Bert etc. The results demonstrate that the proposed approach outperforms than other models. The prediction accuracy achieved by the proposed method is 89.77%.




Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Alpert M, Pouget ER, Silva RR (2001) Reflections of depression in acoustic measures of the patient’s speech. J Affect Disord 66(1):59–69
Andreasen NC (1989) The scale for the assessment of negative symptoms (sans): conceptual and theoretical foundations. Br J Psychiatry 155(S7):49–52
Baltrusaitis T, Robinson P, Morency LP (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE Winter conference on applications of computer vision (WACV), pp 1–10
Chao L, Tao J, Yang M, Li Y (2015) Multi task sequence learning for depression scale prediction from video. In: 2015 International conference on affective computing and intelligent interaction (ACII), pp 526–531
Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF (2015) A review of depression and suicide risk assessment using speech analysis. Speech Commun 71:10–49
De Melo WC, Granger E, Hadid A (2019) Combining global and local convolutional 3d networks for detecting depression from facial expressions. In: 2019 14th IEEE International conference on automatic face and gesture recognition (FG 2019), pp 1–8
Degottex G, Kane J, Drugman T, Raitio T, Scherer S (2014) Covarep—a collaborative voice analysis repository for speech technologies. In: 2014 IEEE international conference on acoustics, speech and signal processing (icassp), pp 960–964
Dingle K, Alati R, Williams GM, Najman JM, Bor W, Clavarino A (2010) The ability of ysr dsm-oriented depression scales to predict dsm-iv depression in young adults: a longitudinal study. J Affect Disord 121(1–2):45–51
Du Z, Li W, Huang D, Wang Y (2019) Encoding visual behaviors with attentive temporal convolution for depression prediction. In: 2019 14th IEEE international conference on automatic face and gesture recognition (FG 2019), pp 1–7
Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preotiuc-Pietro D, Asch DA, Schwartz HA (2018) Facebook language predicts depression in medical records. Proc Natl Acad Sci 115(44):11203–11208
France DJ, Shiavi RG, Silverman S, Silverman M, Wilkes M (2000) Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans Biomed Eng 47(7):829–837
Gavrilescu M, Vizireanu N (2019) Predicting depression, anxiety, and stress levels from videos using the facial action coding system. Sensors 19(17):3693
Graham S, Depp C, Lee EE, Nebeker C, Tu X, Kim HC, Jeste DV (2019) Artificial intelligence for mental health and mental illnesses: an overview. Curr Psychiatry Rep 21(11):1–18
Hassan AU, Hussain J, Hussain M, Sadiq M, Lee S (2017) Sentiment analysis of social networking sites (sns) data using machine learning approach for the measurement of depression. In: 2017 International conference on information and communication technology convergence (ICTC), pp 138–140
Havigerova JM, Haviger J, Kucera D, Hoffmannova P (2019) Text-based detection of the risk of depression. Front Psychol 10:513
Jan A, Meng H, Gaus YFBA, Zhang F (2018) Artificial intelligent system for automatic depression level analysis through visual and vocal expressions. IEEE Trans Cogn Dev Syst 10(3):668–680
Jia J (2018) Mental health computing via harvesting social media data. In: International joint conference on artificial intelligence (IJCAI), pp 5677–5681
Jiang D, Zou D, Deng Z, Dan J (2020) Contextual multimodal sentiment analysis with information enhancement. J Phys Conf Ser 1453:012159
Karam ZN, Provost EM, Singh S, Montgomery J, Archer C, Harrington G, Mcinnis MG (2014) Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4858–4862
Kroenke K, Strine TW, Spitzer RL, Williams JB, Berry JT, Mokdad AH (2009) The phq-8 as a measure of current depression in the general population. J Affect Disord 114(1–3):163–173
Liu R, Chen Y, Zhu X, Hou K (2016) Image classification using label constrained sparse coding. Multimed Tools Appl 75(23):15619–15633
Losada DE, Gamallo P (2020) Evaluating and improving lexical resources for detecting signs of depression in text. Lang Resour Eval 54(1):1–24
Maglanoc LA, Kaufmann T, Jonassen R, Hilland E, Beck D, Landro NI, Westlye LT (2020) Multimodal fusion of structural and functional brain imaging in depression using linked independent component analysis. Hum Brain Mapp 41(1):241–255
Mitra V, Tsiartas A, Shriberg E (2016) Noise and reverberation effects on depression detection from speech. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 5795–5799
Mungra D, Agrawal A, Sharma P, Tanwar S, Obaidat MS (2020) Pratit: a cnn-based emotion recognition system using histogram equalization and data augmentation. Multimed Tools Appl 79(3):2285–2307
Ooi KEB, Lech M, Allen NB (2014) Prediction of major depression in adolescents using an optimized multi-channel weighted speech classification system. Biomed Signal Process Control 14:228–239
Pan W, Wang J, Liu T, Liu X, Liu M, Hu B, Zhu T (2018) Depression recognition based on speech analysis. Chin Sci Bull 63(20):2081–2092
Peng Z, Hu Q, Dang J (2019) Multi-kernel svm based depression recognition using social media data. Int J Mach Learn Cybern 10(1):43–57
Ray A, Kumar S, Reddy R, Mukherjee P, Garg R (2019) Multi-level attention network using text, audio and video for depression prediction. In: Proceedings of the 9th international on audio/visual emotion challenge and workshop, pp 81–88
Sun C, Wang D, Lu H, Yang MH (2018) Learning spatial-aware regressions for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8962–8970
Valstar MF, Sanchez-Lozano E, Cohn JF, Jeni LA, Girard JM, Zhang Z, Yin L, Pantic M (2017) Fera 2017-addressing head pose in the third facial expression recognition and analysis challenge. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017), pp 839–847
Vazquez-Romero A, Gallardo-Antolin A (2020) Automatic detection of depression in speech using ensemble convolutional neural networks. Entropy 22(6):688
Wang X, Chen S, Li T, Li W, Zhou Y, Zheng J, Chen Q, Yan J, Tang B (2020) Depression risk prediction for chinese microblogs via deep-learning methods: content analysis. JMIR Med Inform 8(7):e17958
Wilhelm K, Kotze B, Waterhouse M, Hadzi-Pavlovic D, Parker G (2004) Screening for depression in the medically ill: a comparison of self-report measures, clinician judgment, and dsm-iv diagnoses. Psychosomatics 45(6):461–469
Williamson JR, Young D, Nierenberg AA, Niemi J, Helfer BS, Quatieri TF (2019) Tracking depression severity from audio and video based on speech articulatory coordination. Comput Speech Lang 55:40–56
Yalamanchili B, Kota NS, Abbaraju MS, Nadella VSS, Alluri SV (2020) Real-time acoustic based depression detection using machine learning techniques. In: 2020 International conference on emerging trends in information technology and engineering (ic-ETITE), pp 1–6
Yang L, Jiang D, Sahli H (2021) Integrating deep and shallow models for multi-modal depression analysis-hybrid architectures. IEEE Trans Affect Comput 12(1):239–253
Yung AR, Phillips LJ, Yuen HP, McGorry PD (2004) Risk factors for psychosis in an ultra high-risk group: psychopathology and clinical features. Schizophr Res 67(2–3):131–142
Zhao Z, Bao Z, Zhang Z, Deng J, Cummins N, Wang H, Tao J, Schuller B (2019) Automatic assessment of depression from speech via a hierarchical attention transfer network and attention autoencoders. IEEE J Sel Top Signal Process 14(2):423–434
Zimmerman M, Chelminski I, Posternak M (2004) A review of studies of the Hamilton depression rating scale in healthy controls: implications for the definition of remission in treatment studies of depression. J Nerv Ment Dis 192(9):595–601
Acknowledgements
The author is very grateful to the editors and reviewers for their valuable comments and suggestions on the improvement of the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest to influence on the work reported in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cao, Y., Hao, Y., Li, B. et al. Depression prediction based on BiAttention-GRU. J Ambient Intell Human Comput 13, 5269–5277 (2022). https://doi.org/10.1007/s12652-021-03497-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03497-y