Abstract
Depression is a highly prevalent mental health problem and is a co-morbidity of other mental, physical, and behavioural disorders. The internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these discussions have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A total of 5,000 posts were randomly selected from 24 online communities. Five subgroups of online communities were identified: Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement, and Suicide. Psycholinguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the depression communities from the other subgroups. Topics and psycholinguistic features were found to be highly valid predictors of community subgroup. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in mental health.







Similar content being viewed by others
Notes
All 50 topics learned from the corpus by LDA are placed at http://bit.ly/1JKY2vo
References
Arguello J, Butler BS, Joyce E, Kraut R, Ling KS, Carolyn R, Wang X (2006) Talk to me: Foundations for successful individual−group interactions in online communities. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems, pp 959–968
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993– 1022
Chang X, Nie F, Yi Y, Huang H (2014) A convex formulation for semi−supervised multi−label feature selection. In: Proceedings of AAAI conference on artificial intelligence, pp 1171–1177
Chang X, Yi Y, Xing E, Yaoliang Y (2015) Complex event detection using semantic saliency and nearly−isotonic SVM. In: Proceedings of the International Conference on Machine Learning, pp 1348–1357
Chang X, Nie F, Wang S, Yi Y, Zhou X, Zhang C (2015) Compound rank−k projections for bilinear analysis. IEEE Transactions on Neural Networks and Learning Systems PP(99):1–1
Chen L−S, Eaton WW, Gallo JJ, Gerald N (2000) Understanding the heterogeneity of depression through the triad of symptoms, course and risk factors: A longitudinal, population−based study. J Affect Disord 59(1):1–11
Coppersmith G, Dredze M, Harman C (2014) Quantifying mental health signals in Twitter. In: Proceedings of workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 51–60
Coppersmith G, Harman C, Dredze M (2014) Measuring post traumatic stress disorder in Twitter. In: Proceedings of International AAAI conference on weblogs and social media
Coppersmith G, Dredze M, Harman C, Hollingshead K (2015) From ADHD to SAD: Analyzing the language of mental health on Twitter through self−reported diagnoses. In: Proceedings of Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality
Cruwys T, Haslam SA, Dingle GA, Haslam C, Jetten J, Depression and social identity: An integrative review (2014). In: Personality and Social Psychology Review
Culotta A (2014) Estimating county health statistics with Twitter. In: Proceedings of SIGCHI conference on human factors in computing systems, pp 1335–1344
Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF (2015) A review of depression and suicide risk assessment using speech analysis. Speech Comm 71:10–49
De Choudhury M, Counts S, Horvitz E (2013) Major life changes and behavioral markers in social media: Case of childbirth. In: Proceedings of conference on computer supported cooperative work, pp 1431–1442
De Choudhury M, Counts S, Horvitz E (2013) Predicting postpartum changes in emotion and behavior via social media. In: Proceedings of SIGCHI conference on human factors in computing systems, pp 3267–3276
De Choudhury M, Morris MR, White RW (2014) Seeking and sharing health information online: Comparing search engines and social media. In: Proceedings of SIGCHI conference on human factors in computing systems, pp 1365–1376
De Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. In: Proceedings of international AAAI conference on weblogs and social media
Eggly S, Manning MA, Slatcher RB, Berg RA, Wessel DL, Newth CJL, Shanley TP, Harrison R, Dalton H, Dean MJ, Doctor A, Jenkins T, Meert KL (2014) Language analysis as a window to bereaved parents’ emotions during a parent–physician bereavement meeting. J Lang Soc Psychol
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1
George DR, Dellasega C, Whitehead MM, Bordon A (2013) Facebook−based stress management resources for first−year medical students: A multi−method evaluation. Comput Hum Behav 29(3):559–562
Giles J (2012) Making the links. Nature 488(7412):448–450
Goldberg D (2011) The heterogeneity of “major depression”. World Psychiatry 10(3):226–228
Grajales F.J III, Sheps S, Ho K, Novak−Lauscher H, Eysenbach G (2014) Social media: A review and tutorial of applications in medicine and health care. J Med Internet Res 16(2):e13
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(90001):5228–5235
Hollenbaugh EE (2011) Motives for maintaining personal journal blogs. Cyberpsychology, Behavior, and Social Networking 14(1−2):13–20
Houston TK, Cooper LA, Ford DE (2002) Internet support groups for depression: A 1−year prospective cohort study. Am J Psychiatr 159(12):2062–2068
Johnson GJ, Ambrose PJ (2006) Neo−tribes: The power and potential of online communities in health care. Commun ACM 49(1):107–113
Jeong YS, Nhi−Ha T, Shyu I, Chang T, Fava M, Kvedar J, Yeung A (2013) Using online social media: Facebook, in screening for major depressive disorder among college students. Int J Clin Health Psychol 13(1):74–80
Kessler RC, Heeringa S, Lakoma MD, Petukhova M, Rupp AE, Schoenbaum M, Wang PS, Zaslavsky AM (2008) The individual−level and societal−level effects of mental disorders on earnings in the United States: Results from the national comorbidity survey replication. Am J Psychiatry 165(6):703–711
Klonsky DE, Oltmanns TF, Turkheimer E (2003) Deliberate self−harm in a nonclinical population: Prevalence and psychological correlates. Am J Psychiatr 160 (8):1501–1508
Larsen ME, Boonstra TW, Batterham PJ, O’Dea B, Paris C, Christensen H (2015) We feel: Mapping emotion on Twitter. IEEE Journal of Biomedical and Health Informatics 19(4):1246–1252
Laserna CM, Seih Y−T, Pennebaker J.W (2014) Um... who like says you know: Filler word use as a function of age, gender, and personality
McDaniel BT, Coyne SM, Holmes EK (2012) New mothers and media use: Associations between blogging, social networking, and maternal well−being. Matern Child Health J 16(7):1509–1517
Moreno MA, Jelenchick LA, Egan KG, Cox E, Young H, Gannon KE, Tara B (2011) Feeling bad on Facebook: Depression disclosures by college students on a social networking site. Depress Anxiety 28(6):447–455
Mundt JC, Vogel AP, Feltner DE, Lenderking WR (2012) Vocal acoustic biomarkers of depression severity and treatment response. Biol Psychiatry 72(7):580–587
Nguyen T, Phung D, Bo D, Venkatesh S, Berk M (2014) Affective and content analysis of online depression communities. IEEE Trans Affect Comput 5 (3):1949–3045
Nguyen T, Duong T, Venkatesh S, Phung D (2015) Austism blogs: Expressed emotion, language styles and concerns in personal and community settings. IEEE Trans Affect Comput 6(3):312–323
Nguyen T, O’Dea B, Larsen M, Phung D, Venkatesh S, Christensen H (2015) Differentiating sub−groups of online depression−related communities using textual cues. In: Proceedings of web information systems engineering conference. Springer, pp 216–224
Nie L, Li T, Akbari M, Shen J, Chua T−S (2014) Wenzher: Comprehensive vertical search for healthcare domain. In: Proceedings of International ACM conference on research & development in information retrieval, pp 1245–1246
Nie L, Zhao Y−L, Akbari M, Shen J, Chua T−S (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans Knowl Data Eng 27(2):396–409
O’Dea B, Wan S, Batterham P.J, Calear A.L, Paris C, Christensen H (2015) Detecting suicidality on Twitter. Internet Interventions 2(2):183–188
Park M, McDonald D, Meeyoung C (2013) Perception differences between the depressed and non−depressed users in Twitter. In: Proceedings of AAAI International conference on weblogs and social media
Parker G, McCraw S, Paterson A (2015) Clinical features distinguishing grief from depressive episodes: A qualitative analysis. J Affect Disord 176:43–47
Patrick K, Sheehan J, Bietz M, Gregory J, Claffey M, Calvert S, Melichar L, Downs S (2013) Gaining insight from patient & person−generated real world/real time data. In Medicine 2:0
Paul MJ, Dredze M (2014) Discovering health topics in social media using topic models. PLoS One 9(8):e103408
Pennebaker JW, Francis ME, Booth RJ (2007) Linguistic Inquiry and Word Count (LIWC) [Computer software]. LIWC Inc
Powell J, McCarthy N, Eysenbach G (2003) Cross−sectional survey of users of internet depression communities. BMC Psychiatry 3(1):19
Preotiuc−Pietro D, Eichstaedt J, Park G, Sap M, Smith L, Tobolsky V, Schwartz HA, Ungar L (2015) The role of personality, age and gender in tweeting about mental illnesses. In: Proceedings of Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality
Ramirez−Esparza N, Chung CK, Kacewicz E, Pennebaker JW (2008) The psychology of word use in depression forums in English and in Spanish: Testing two text analytic approaches. In: Proceedings of AAAI International Conference on Weblogs and Social Media, pp 102–108
Rodriguez AJ, Holleran SE, Matthias RM (2010) Reading between the lines: The lay assessment of subclinical depression from written self−descriptions. J Pers 78 (2):575–598
Rude S, Gortner E−M, Pennebaker J (2004) Language use of depressed and depression−vulnerable college students. Cognition & Emotion 18(8):1121–1133
Schwartz H, Eichstaedt J, Kern M, Dziurzynski L, Lucas R, Agrawal M, Park G, Lakshmikanth S, Jha S, Seligman M, Ungar L (2013) Characterizing geographic variation in well−being using tweets. In: Proceedings of International AAAI Conference on Weblogs and Social Media
Song X, Nie L, Zhang L, Akbari M, Chua T−S (2015) Multiple social network learning and its application in volunteerism tendency prediction. In: Proceedings of International ACM Conference on Research & Development in Information Retrieval, pp 213–222
Song X, Nie L, Zhang L, Liu M, Chua T−S (2015) Interest inference via structure−constrained multi−source multi−task learning. In: Proceedings of International Joint Conference on Artificial Intelligence. AAAI Press, pp 2371–2377
Stirman SW, Pennebaker JW (2001) Word use in the poetry of suicidal and nonsuicidal poets. Psychosom Med 63(4):517–522
Tsuya A, Sugawara Y, Tanaka A, Narimatsu H (2014) Do cancer patients tweet? Examining the Twitter use of cancer patients in Japan. J Med Internet Res 16 (5):e137
Van der Maaten L, Hinton G (2008) Visualizing data using t−SNE. J Mach Learn Res 9(2579−2605):85
Vinod Vydiswaran VG, Yang L, Kai Z, Hanauer DA, Qiaozhu M (2014) User−created groups in health forums: What makes them special?. In: Proceedings of International AAAI Conference on Weblogs and Social Media
Volkova S, Bacharach Y, Armstrong M, Sharma V (2015) Inferring latent user properties from texts published in social media. In: Proceedings of Twenty−Ninth Conference on Artificial Intelligence
Wang PS, Angermeyer M, Borges G, Bruffaerts R, Chiu WT, Girolamo GD, Fayyad J, Gureje O, Haro JM, Huang Y (2007) Delay and failure in treatment seeking after first onset of mental disorders in the World Health Organization’s World Mental Health Survey Initiative. World Psychiatry 6(3):177
Wang S, Chang X, Li X, Sheng QZ , Chen W (2014) Multi−task support vector machines for feature selection with shared knowledge discovery. Signal Process
Waxer PH (1976) Nonverbal cues for depth of depression: Set versus no set. J Consult Clin Psychol 4(3):493
World Health Organization (2009) Global health risks: Mortality and burden of disease attributable to selected major risks
Yan Y, Liu G, Ricci E, Sebe N (2013) Multi−task linear discriminant analysis for multi−view action recognition. In: Proceedings of IEEE International conference on image processing, pp 2842–2846
Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013) No matter where you are: Flexible graph−guided multi−task learning for multi−view head pose classification under target motion. In: Proceedings of IEEE International Conference on Computer Vision, pp 1177–1184
Yan Y, Ricci E, Liu G, Sebe N (2015) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nguyen, T., O’Dea, B., Larsen, M. et al. Using linguistic and topic analysis to classify sub-groups of online depression communities. Multimed Tools Appl 76, 10653–10676 (2017). https://doi.org/10.1007/s11042-015-3128-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3128-x