Skip to main content

Advertisement

Log in

SetembroBR: a social media corpus for depression and anxiety disorder prediction

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The present work introduces a novel dataset—hereby called the SetembroBR corpus—for the study and development of depression and anxiety disorder predictive models in the Portuguese language based on the information prior to a diagnosis. The corpus comprises both text- and network-related information related to 3.9 thousand Twitter users who self-reported a diagnosis or treatment for a mental disorder, and its use is illustrated by a number of experiments addressing the issues of depression and anxiety disorder prediction from social media data. Our present results are intended as a first step towards investigating how mental health statuses are expressed on Portuguese-speaking social media, and pave the way for computational applications intended to assist with a pressing issue of great social interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://www.who.int/news-room/fact-sheets/detail/depression.

  2. Which are nevertheless possible when using questionnaires and interviews as well.

  3. Portuguese pronoun resolution has been the focus of Paraboni (1997), Paraboni and de Lima (1998) and others.

  4. https://drive.google.com/drive/folders/1MXFRs0u8iF1RNUWABTA0Oz8_Ix1skqZT?usp=sharing.

  5. The full list of original (Portuguese) query strings is available from https://drive.google.com/file/d/16eSFtUo9l1pGbpt0BDr_1ZCmNRGpw0Vn/view?usp=sharing.

  6. When two or more valid end points are available, as in a preliminary diagnosis given by a psychologist, which is confirmed later on by a psychiatrist, the earliest possible event is marked as [end].

  7. Similarly, in Yates et al. (2017), for instance, users with fewer than 100 posts are discarded.

  8. Not real Twitter data.

  9. Making 3903 partially overlapping diagnosed cases in total.

  10. Random message selection is intended to avoid making assumptions about the distribution of messages in time. More domain-dependent alternatives to this will be discussed in the subsequent sections.

  11. In a pilot study, we also attempted to use higher order n-gram models, but results remained essentially the same as those presently reported for unigrams.

  12. https://eli5.readthedocs.io/en/latest/.

  13. Recall that, in both tasks, the negative class consists of random data, and it is presently not under scrutiny.

References

  • Aguilera, J., Farías, D. I. H., Ortega-Mendoza, R. M., & y Gómez, M. M. (2021). Depression and anorexia detection in social media as a one-class classification problem. Applied Intelligence, 51, 6088–6103. https://doi.org/10.1007/s10489-020-02131-2.

    Article  Google Scholar 

  • Al-Mosaiwi, M., & Johnstone, T. (2018). In an absolute state: Elevated use of absolutist words is a marker specific to anxiety, depression, and suicidal ideation. Clinical Psychological Science, 6(4), 529–542. https://doi.org/10.1177/2167702617747074.

    Article  PubMed  PubMed Central  Google Scholar 

  • Almouzini, S., khemakhem, M., & Alageel, A. (2019). Detecting Arabic depressed users from Twitter data. Procedia Computer Science, 163, 257–265. https://doi.org/10.1016/j.procs.2019.12.107.

    Article  Google Scholar 

  • American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). American Psychiatric Association. https://doi.org/10.1176/appi.books.9780890425596.

    Book  Google Scholar 

  • Aragón, M.E., López-Monroy, A.P., González-Gurrola, L.C., & y Gómez, M.M. (2019). Detecting depression in social media using fine-grained emotions. In: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Minneapolis, USA (pp. 1481–1486). https://doi.org/10.18653/v1/N19-1151.

  • Aschbrenner, K.A., Naslund, J.A., Grinley, T., Bienvenida, J.C.M., Bartels, S.J., & Brunette, M. (2018). A survey of online and mobile technology use at peer support agencies. Psychiatric Quarterly (pp. 1–10).

  • Bak, M., Chin, J., & Chiu, C. (2022). Mental health pandemic during the COVID-19 outbreak: Calls for help on social media. https://doi.org/10.48550/ARXIV.2203.00237.

  • Birnbaum, M. L., Rizvi, A. F., Correll, C. U., Kane, J. M., & Confino, J. (2017). Role of social media and the internet in pathways to care for adolescents and young adults with psychotic disorders and nonpsychotic mood disorders. Early Intervention in Psychiatry, 11(4), 290–295.

    Article  PubMed  Google Scholar 

  • Briciu, A., & Lupea, M. (2018). Studying the language of mental illness in romanian social media. In IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP), (pp. 21–28), https://doi.org/10.1109/ICCP.2018.8516436.

  • Brunette, M., Achtyes, E., Pratt, S., Stilwell, K., Opperman, M., Guarino, S., & Kay-Lambkin, F. (2019). Use of smartphones, computers and social media among people with smi: opportunity for intervention. Community Mental Health Journal (pp. 1–6).

  • Bucci, S., Schwannauer, M., & Berry, N. (2019). The digital revolution and its impact on mental health care. Psychology and Psychotherapy: Theory, Research and Practice, 92(2), 277–297.

    Article  Google Scholar 

  • Burdisso, S. G., Errecalde, M., & y Gómez, M. M. (2020). t-SS3: A text classifier with dynamic n-grams for early risk detection over text streams. Pattern Recognition Letters, 138, 130–137. https://doi.org/10.1016/j.patrec.2020.07.001.

    Article  ADS  Google Scholar 

  • Cacheda, F., Fernandez, D., Novoa, F. J., & Carneiro, V. (2019). Early detection of depression: Social network analysis and random forest techniques. Journal of Medical Internet Research, 21(6), e12554. https://doi.org/10.2196/12554.

    Article  PubMed  PubMed Central  Google Scholar 

  • Chancellor, S., & Choudhury, M. D. (2020). Methods in predictive techniques for mental health status on social media: A critical review. npj Digital Medicine. https://doi.org/10.1038/s41746-020-0233-7.

    Article  PubMed  PubMed Central  Google Scholar 

  • Choudhury, M.D., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media. In: International AAAI Conference on Web and Social Media (ICWSM), AAAI.

  • Coello-Guilarte, L., Ortega-Mendoza, R.M., Villasenor-Pineda, L., & y Gómez, M.M. (2019). Crosslingual depression detection in twitter using bilingual word alignments. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2019). Lecture Notes in Computer Science vol. 11696, Springer International Publishing, Cham, (pp. 49–61), https://doi.org/10.1007/978-3-030-28577-7_2.

  • Cohan, A., Desmet, B., Yates, A., Soldaini, L., MacAvaney, S., & v Goharian,. (2018). SMHD: a large-scale resource for exploring online language usage for multiple mental health conditions. 27th International Conference on Computational Linguistics (pp. 1485–1497). Santa Fe, USA: Association for Computational Linguistics.

  • Coppersmith, G., Dredze, M., Harman, C., Kristy, H., & Mitchell, M. (2015). CLPsych 2015 shared task: Depression and PTSD on Twitter. Second workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality (pp. 31–39). Association for Computational Linguistics.

    Chapter  Google Scholar 

  • Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, (pp. 4171–4186).

  • dos Santos, W. R., Funabashi, A. M. M., & Paraboni, I. (2020). Searching Brazilian Twitter for signs of mental health issues. 12th International Conference on Language Resources and Evaluation (LREC-2020) (pp. 6113–6119). Marseille, France: ELRA.

  • dos Santos, W.R., & Paraboni, I. (2019). Moral Stance Recognition and Polarity Classification from Twitter and Elicited Text. In: Recents Advances in Natural Language Processing (RANLP-2019), Varna, Bulgaria, (pp. 1069–1075), https://doi.org/10.26615/978-954-452-056-4_123.

  • dos Santos, W. R., Ramos, R. M. S., & Paraboni, I. (2020). Computational personality recognition from Facebook text: Psycholinguistic features, words and facets. New Review of Hypermedia and Multimedia, 25(4), 268–287. https://doi.org/10.1080/13614568.2020.1722761.

    Article  Google Scholar 

  • Dutta, S., & Choudhury, M. D. (2020). Characterizing anxiety disorders with online social and interactional networks. Knowledge and social media. HCI International 2020—Late breaking papers: Interaction (pp. 249–264). Springer International Publishing.

    Google Scholar 

  • Ernala, S.K., Birnbaum, M.L., Candan, K.A., Rizvi, A.F., Sterling, W.A., Kane, J.M., & Choudhury, M.D. (2019). Methodological gaps in predicting mental health states from social media: Triangulating diagnostic signals. In: 2019 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, New York, USA, (pp. 1–16), https://doi.org/10.1145/3290605.3300364.

  • Giuntini, F. T., Cazzolato, M. T., de Jesus Dutra dos Reis, M., Campbell, A. T., Traina, A. J. M., & Ueyama, J. (2020). A review on recognizing depression in social networks: challenges and opportunities. Journal of Ambient Intelligence and Humanized Computing, 11, 4713–4729. https://doi.org/10.1007/s12652-020-01726-4.

    Article  Google Scholar 

  • Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., & Aluísio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. In: 11th Brazilian Symposium in Information and Human Language Technology - STIL, Uberlândia, Brazil, (pp. 122–131).

  • Katchapakirin, K., Wongpatikaseree, K., Yomaboot, P., & Kaewpitakkun, Y. (2018). Facebook social media for depression detection in the thai community. In: 15th International Joint Conference on Computer Science and Software Engineering (JCSSE), (pp. 1–6), https://doi.org/10.1109/JCSSE.2018.8457362.

  • Kumar, A., Sharma, A., & Arora, A. (2019). Anxious depression prediction in real-time social data. In: International Conference on Advances in Engineering Science Management & Technology (ICAESMT), Dehradun, India.

  • Leis, A., Ronzano, F., Mayer, M. A., Furlong, L. I., & Sanz, F. (2019). Detecting signs of depression in Tweets in Spanish: Behavioral and linguistic analysis. Journal of Medical Internet Research, 21(6), e14199. https://doi.org/10.2196/14199.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lin, C., Hu, P., Su, H., Li, S., Mei, J., Zhou, J., & Leung, H. (2020). SenseMood: Depression detection on social media (pp. 407–411). Association for Computing Machinery.

    Google Scholar 

  • Losada, D. E., & Crestani, F. (2016). A test collection for research on depression and language use. Experimental IR meets multilinguality, multimodality, and interaction (pp. 28–39). Springer.

    Chapter  Google Scholar 

  • Losada, D. E., Crestani, F., & Parapar, J. (2017). eRISK 2017: CLEF lab on early risk prediction on the internet: Experimental foundations. Lecture Notes in Computer Science (Vol. 10456, pp. 346–360). Springer.

    Google Scholar 

  • Losada, D. E., Crestani, F., & Parapar, J. (2018). Overview of eRisk: early risk prediction on the Internet. Lecture notes in computer science (Vol. 11018, pp. 343–361). Springer.

    Google Scholar 

  • Losada, D.E., Crestani, F., & Parapar, J. (2019). Overview of eRisk 2019 Early Risk Prediction on the Internet. In: Lecture Notes in Computer Science vol 11696.

  • Loveys, K., Crutchley, P., Wyatt, E., & Coppersmith, G. (2017). Small but mighty: Affective micropatterns for quantifying mental health from social media language. In: Fourth Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Association for Computational Linguistics, Vancouver, Canada, (pp. 85–95), https://doi.org/10.18653/v1/W17-3110.

  • Lynn, V., Goodman, A., Niederhoffer, K., Loveys, K., Resnik, P., & Schwartz, H.A. (2018). CLPsych 2018 shared task: Predicting current and future psychological health from childhood essays. In: Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, Association for Computational Linguistics, New Orleans, USA, (pp. 37–46), https://doi.org/10.18653/v1/W18-0604.

  • Mann, P., Paes, A., & Matsushima, E.H. (2020). See and read: Detecting depression symptoms in higher education students using multimodal social media data. In Proceedings of the International AAAI Conference on Web and Social Media, (pp. 440–451).

  • McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157. https://doi.org/10.1007/BF02295996.

    Article  CAS  PubMed  Google Scholar 

  • McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27(1), 415–444. https://doi.org/10.1146/annurev.soc.27.1.415.

    Article  Google Scholar 

  • Ministério da Saúde do Brasil. (2022). Vigitel Brasil 2020: vigilância de fatores de risco e proteção para doenças crônicas por inquérito telefônico: estimativas sobre frequência e distribuição sociodemográfica de fatores de risco e proteção para doenças crônicas nas capitais dos 26 estados brasileiros e no Distrito Federal em 2021. Ministério da Saúde, Brasília: Tech. rep.

  • Nascimento, R., Parreira, P., dos Santos, G., & Guedes, G.P. (2018). Identificando sinais de comportamento depressivo em redes sociais. In: Anais do VII Brazilian Workshop on Social Network Analysis and Mining, SBC, Porto Alegre, Brazil, https://doi.org/10.5753/brasnam.2018.3597.

  • Naslund, J. A., Bondre, A., Torous, J., & Aschbrenner, K. A. (2020). Social media and mental health: Benefits, risks, and opportunities for research and practice. Journal of Technology in Behavioral Science, 5, 245–257. https://doi.org/10.1007/s41347-020-00134-x.

    Article  PubMed  PubMed Central  Google Scholar 

  • Paraboni, I. (1997). Uma arquitetura para a resolução de referências pronominais possessivas no processamento de textos em língua portuguesa. Master’s thesis, PUCRS, Porto Alegre.

  • Paraboni, I., & de Lima, V.L.S. (1998). Possessive pronominal anaphor resolution in Portuguese written texts. In Proceedings of the 17th international conference on Computational linguistics-Volume 2, Association for Computational Linguistics, (pp. 1010–1014).

  • Park, S., Lee, S. W., Kwak, J., Cha, M., & Jeong, B. (2013). Activities on Facebook reveal the depressive state of users. Journal of Medical Internet Research, 15(10), e217.

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  • Pavan, M.C., dos Santos, W.R., & Paraboni, I. (2020). Twitter Moral Stance Classification using Long Short-Term Memory Networks. In: 9th Brazilian Conference on Intelligent Systems (BRACIS). LNAI 12319, Springer, (pp. 636–647), https://doi.org/10.1007/978-3-030-61377-8_45.

  • Seabrook, E. M., Kern, M. L., Fulcher, B. D., & Rickard, N. S. (2018). Predicting depression from language-based emotion dynamics: Longitudinal analysis of Facebook and Twitter status updates. Journal of Medical Internet Research, 20(5), e168. https://doi.org/10.2196/jmir.9267.

    Article  PubMed  PubMed Central  Google Scholar 

  • Semenov, A., Natekin, A., Nikolenko, S., Upravitelev, P., Trofimov, M., & Kharchenko, M. (2015). Discerning depression propensity among participants of suicide and depression-related groups of vk.com. In: Analysis of Images, Social Networks and Texts, Springer International Publishing, Cham, (pp. 24–35).

  • Shen, G., Jia, J., Nie, L., Feng, F., Zhang, C., Hu, T., Chua, T.S., & Zhu, W. (2017). Depression detection via harvesting social media: A multimodal dictionary learning solution. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, (pp. 3838–3844), https://doi.org/10.24963/ijcai.2017/536.

  • Shen, J.H., & Rudzicz, F. (2017). Detecting anxiety on Reddit. In Fourth Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Association for Computational Linguistics, Vancouver, Canada, (pp. 58–65), https://doi.org/10.18653/v1/W17-3107.

  • Shen, T., Jia, J., Shen, G., Feng, F., He, X., Luan, H., Tang, J., Tiropanis, T., Chua, T.S., & Hall, W. (2018). Cross-domain depression detection via harvesting social media. In Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, International Joint Conferences on Artificial Intelligence Organization, (pp. 1611–1617), https://doi.org/10.24963/ijcai.2018/223.

  • Shrestha, A., & Spezzano, F. (2019). Detecting depressed users in online forums. In: 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), (pp. 945–951), https://doi.org/10.1145/3341161.3343511.

  • Song, H., You, J., Chung, J.W., & Park, J.C. (2018). Feature attention network: Interpretable depression detection from social media. In 32nd Pacific Asia Conference on Language, Information and Computation, Association for Computational Linguistics, Hong Kong.

  • Souza, F., Nogueira, R., & Lotufo, R. (2020a). BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems (BRACIS) - LNCS 12319, Springer, Cham, https://doi.org/10.1007/978-3-030-61377-8_28.

  • Souza, V., Nobre, J., & Becker, K. (2020). Characterization of anxiety, depression, and their comorbidity from texts of social networks. Anais do XXXV Simpósio Brasileiro de Bancos de Dados (pp. 121–132). SBC.

    Google Scholar 

  • Su, C., Xu, Z., Pathak, J., & Wang, F. (2020). Deep learning in mental health outcome research: A scoping review. Translational Psychiatry. https://doi.org/10.1038/s41398-020-0780-3.

    Article  PubMed  PubMed Central  Google Scholar 

  • Trifu, R., Nemes, B., Bodea-Hategan, C., & Cozman, D. (2017). Linguistic indicators of language in major depressive disorder (MDD). An evidence based research. Journal of Evidence-Based Psychotherapies, 17, 105–128. https://doi.org/10.24193/jebp.2017.1.7.

    Article  Google Scholar 

  • Trotzek, M., Koitka, S., & Friedrich, C.M. (2018). Utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences. IEEE Transactions on Knowledge and Data Engineering.

  • Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., & Ohsaki, H. (2015). Recognizing depression from twitter activity. 33rd Annual ACM Conference on Human Factors in Computing Systems (pp. 3187–3196). New York, USA: Association for Computing Machinery.

  • Yates, A., Cohan, A., & Goharian, N. (2017). Depression and self-harm risk assessment in online forums. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, (pp. 2968–2978), https://doi.org/10.18653/v1/D17-1322.

  • Yazdavar, A.H., Al-Olimat, H.S., Ebrahimi, M., Bajaj, G., Banerjee, T., Thirunarayan, K., Pathak, J., & Sheth, A. (2017). Semi-supervised approach to monitoring clinical depressive symptoms in social media. In IEEE/ACM International Conference on Advances in Social Network Analysis and Mining, (pp. 1191–1198), https://doi.org/10.1145/3110025.3123028.

  • Yazdavar, A. H., Mahdavinejad, M. S., Bajaj, G., Romine, W., Sheth, A., Monadjemi, A. H., et al. (2020). Multimodal mental health analysis in social media. PLoS ONE, 15(4), 1–27. https://doi.org/10.1371/journal.pone.0226248.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was carried out at the Center for Artificial Intelligence (C4AI-USP), with support by the São Paulo Research Foundation (FAPESP Grant #2019/07665-4) and by the IBM Corporation. The present work has been financed by the São Paulo Research Foundation (FAPESP Grant #2021/08213-0). The first author has been supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 - Grant # 88887.475847/2020-00.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivandré Paraboni.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Examples of self-disclosure and discarded statements

Appendix: Examples of self-disclosure and discarded statements

The following are examples of self-disclosure tweets that would be selected for further inspection in the corpus collection task.

  • Acabo de ser diagnosticado com depressão e ansiedade. I have just been diagnosed with depression and anxiety.

  • Comecei meu tratamento para ansiedade em 2017. I started my treatment for anxiety in 2017.

  • Uau adivinha quem voltou da psicóloga com uma receita de antidepressivo? Wow guess who just came back from the psychologist’s with an antidepressant prescription?

  • Fui diagnosticado com ansiedade no final do ano passado. I was diagnosed with anxiety at the end of last year.

  • Fui ao médico e recebi diagnóstico de TAG. Ainda não estou acreditando. I went to the doctor’s and I received a GAD diagnosis. I still can’t believe it

The following are examples of tweets that would be discarded during the corpus collection task.

  • (unknown diagnosis date). Fui diagnosticado com depressão aos 15 anos de idade. I was diagnosed with depression at the age of 15.

  • (recurrent condition) Comecei hoje um tratamento para depressão. Lá vamos nós outra vez! Today I started a treatment for depression. Here we go again!

  • (irony/sarcasm) Depois de assistir a esse vídeo eu fui diagnosticado com depressão profunda. After watching this video I was diagnosed with severe depression.

  • (no medical specialist, or irony/sarcasm) Acabo de ser diagnosticado com depressão de acordo com o teste do Ursinho Pooh I have just been diagnosed with depression according to a Winnie-the-Pooh test.

  • (no explicit depression/anxiety disclosure) Hoje comecei a tomar um antidepressivo para a enxaqueca. Today I started taking an antidepressant for migraine.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santos, W.R.d., de Oliveira, R.L. & Paraboni, I. SetembroBR: a social media corpus for depression and anxiety disorder prediction. Lang Resources & Evaluation 58, 273–300 (2024). https://doi.org/10.1007/s10579-022-09633-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-022-09633-0

Keywords

Navigation