Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment

Elyoseph, Zohar; Levkovich, Inbar

doi:10.3389/fpsyt.2023.1213141

BRIEF RESEARCH REPORT article

Front. Psychiatry, 01 August 2023

Sec. Digital Mental Health

Volume 14 - 2023 | https://doi.org/10.3389/fpsyt.2023.1213141

This article is part of the Research Topic AI Approach to the Psychiatric Diagnosis and Prediction View all 5 articles

Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment

Zohar Elyoseph^1,2^*^†

Inbar Levkovich³^†

¹Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel
²Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
³Faculty of Graduate Studies, Oranim Academic College, Kiryat Tiv'on, Israel

ChatGPT, an artificial intelligence language model developed by OpenAI, holds the potential for contributing to the field of mental health. Nevertheless, although ChatGPT theoretically shows promise, its clinical abilities in suicide prevention, a significant mental health concern, have yet to be demonstrated. To address this knowledge gap, this study aims to compare ChatGPT’s assessments of mental health indicators to those of mental health professionals in a hypothetical case study that focuses on suicide risk assessment. Specifically, ChatGPT was asked to evaluate a text vignette describing a hypothetical patient with varying levels of perceived burdensomeness and thwarted belongingness. The ChatGPT assessments were compared to the norms of mental health professionals. The results indicated that ChatGPT rated the risk of suicide attempts lower than did the mental health professionals in all conditions. Furthermore, ChatGPT rated mental resilience lower than the norms in most conditions. These results imply that gatekeepers, patients or even mental health professionals who rely on ChatGPT for evaluating suicidal risk or as a complementary tool to improve decision-making may receive an inaccurate assessment that underestimates the actual suicide risk.

Introduction

ChatGPT is a tool developed by OpenAI that is based on GPT language model technology and available in the public domain (1). In the few months since it was launched on 30 November 2022, ChatGPT has gained a remarkable 100 million users, making it the most rapidly growing consumer application to date (2). ChatGPT is a highly sophisticated chatbot that can handle text-based requests ranging from simple queries to more advanced tasks. It comprehends and interprets user requests and generates appropriate responses in nearly natural human language (3). This ability to generate human-like language and perform complex tasks makes ChatGPT a significant breakthrough in the fields of natural language processing and artificial intelligence (2, 3). Most studies conducted on ChatGPT have focused on its use in academia (4), whereas exploration of its applications in the field of applied psychology has been limited. While ChatGPT demonstrates promising theoretical potential (5), its clinical capabilities in the field of mental health remain unclear, and particularly its ability to address critical issues such as suicide prevention, a significant mental health concern.

Suicide constitutes a major health problem and cause of death across the globe. Of the approximately 90–100 suicide attempts per 100,000 individuals each year in Israel, 7.9 result in death (6, 7). In the United States, the death rate is 14.2 per 100,000 suicide attempts (8). Psychiatric diseases are at least 10 times more prevalent among individuals who attempt and/or commit suicide than in the general population (9). Suicide attempts are often impulsive and occur at times of crisis as a response to being unable to manage daily stresses and demands (10). Among patients discharged from the hospital following a suicide attempt, the risk of a subsequent attempt during the first 3 years after hospitalization is 12%–30% (11).

In response, considerable efforts have been invested in developing effective suicide prevention strategies to reduce the risk of recurring attempts (12). Early identification of individuals at risk of suicide is a fundamental prevention strategy, particularly among the high-risk population of mental health patients (13). Hence, clinicians’ ability to recognize indicators of suicide potential is essential to implement appropriate crisis management and suicide intervention strategies, especially during times of acute crisis (14). Given the magnitude of the problem, significant investment in suicide prevention programs is crucial. Recent initiatives have focused on training gatekeepers from various community groups (e.g., teachers, policymakers, and military commanders) to detect indicators of suicide risk (15, 16). These efforts aim to broaden the pool of professionals capable of evaluating suicide risk beyond psychiatrists and clinical psychologists working in hospital settings. The term “gatekeeper” refers to individuals who are in direct contact with those who may be at risk of suicide and can be trained to identify significant suicide risk factors. Gatekeeper training programs are often part of broader suicide prevention initiatives aimed at enhancing knowledge and providing training for identifying risk factors and suicidal behaviors. These programs also aim to equip participants with the necessary skills to assess an individual’s level of risk and to manage the situation effectively by providing access to appropriate resources and treatment referrals (17). The role of gatekeepers is to facilitate access to appropriate care and treatment for those at risk of suicide (17). Research suggests that such gatekeepers can be useful in reducing suicide as part of a systematic approach to suicide prevention (18–20). In theory, artificial intelligence (AI) has the potential to support gatekeepers in their decision-making processes and improve the effectiveness of formal psychometric tools and clinical assessments in predicting suicide behavior. Currently, these methods are often found to have insufficient predictive capabilities (21, 22).

The current study

In the current research, we investigated the identifiable limitations of ChatGPT to evaluate suicide risk and to identify associated factors. In addition, we examined whether suicidality risk assessment contains fundamental principles of the Interpersonal Theory of Suicide (ITS), a well-established and empirically supported theoretical framework proposed by Joiner (23, 24) for assessing the risk of suicide and identifying associated factors. Specifically, we examined how the two core dimensions of ITS, namely perceived burdensomeness and perceived thwarted belongingness, influence therapists’ perceptions and evaluations of suicide ideation and behavior. The term perceived burdensomeness refers to a psychological state in which individuals perceives that their very existence imposes a burden on their family, friends, and/or society. According to Joiner et al. (25), this perception leads individuals to believe that their death would be more beneficial than their continued life, in terms of the impact on their family, friends, society, and so on. As defined by Van Orden et al. (24), the concept of thwarted belongingness refers to a feeling of estrangement from others, with particular emphasis on the distressing sensation of being excluded from one’s family, friends, or other cherished groups. This construct appears to play a crucial role in suicide, as it represents a basic psychological requirement. Failure to meet this need may lead to an inescapable sense of distress (26). We specifically selected perceived burdensomeness and thwarted belongingness, excluding other risk factors like gender, age, access to weapons, etc., because they provide a closer reflection of the patient’s mental experience. By adopting this approach, we aim to investigate ChatGPT’s identifiable limitations to assess suicidal risk based on how the patient interprets their interpersonal experiences.

The objective of the current study:

To exploring identifiable limitations using ChatGPT as a suicide risk assessment tool across two identifiable variables perceived burdensomeness and thwarted belongingness.

Methods

Artificial intelligence procedure

Using the free version of ChatGPT (OpenAI San Francisco), we conducted three evaluations during the period of 20–30 March 2023 (ChatGPT March 14 version) to examine how perceived burdensomeness and thwarted belongingness affect the performance of ChatGPT in assessing various aspects of suicidal risk compared to the performance of human therapists.

Input source

In the present investigation, we adopted the text vignette methodology as elucidated by Bresin et al. (27) to examine the observer’s perspective of suicidal acts. A specific vignette from Levi-Belz and Gamliel (28) was entered to the ChatGPT interface. The chosen vignette contains general background and two additional paragraphs: one for evaluating low or high perceived burdensomeness and the other for assessing low or high thwarted belongingness.

General background

Jane is a 37-year-old single woman without children, currently living alone, but in her past spent a long time living with her father, who passed away 2 years ago following a severe illness. Jane holds a B.A. degree and has kept a steady job for 10 years. Recently, however, her manager informed her that, due to financial cuts, she must dismiss her. Jane had a close, intimate relationship for the past year with a boyfriend, her first serious and meaningful one; she was very excited about the relationship, hoping it would lead to marriage. However, following a period of repeated quarrels, her boyfriend informed her he no longer loved her, and he left her. Jane was left very sad and hurt. In recent weeks she keeps thinking about the relationship over and over again, trying to analyze what happened. Jane is experiencing a heavy emotional load, she is very tired, lacking energy, and finding it hard to sleep nights. Her routine tasks have become difficult for her, and she finds it hard to concentrate and to function properly. She dreads the future, and it seems to her that there is no chance that she would find a love like that again.

Low perceived burdensomeness condition

In general, Jane feels she has abilities. She feels she has something to contribute to the world and to others around her, and sometimes people come to her for help and advice. She feels important and meaningful, and she seems to believe in herself.

High perceived burdensomeness condition

Jane feels worthless. Often she experiences herself as a burden to the people around her and she feels that it would be better if she were gone. She often sees herself as having no real meaning in other people’s lives and she experiences feelings of self-hatred. For example, she is angry at herself and thinks that the departure of her boyfriend was her fault.

Low thwarted belongingness condition

Jane has some close friends, and she feels that she can tell them about what she has been going through in recent weeks. In general she makes connections relatively easily and, despite the recent crisis, she feels lucky because there are people around who care for her.

High thwarted belongingness condition

Jane is experiencing significant loneliness in general and she has only some, if any, close friends around her. She is very withdrawn, distant from others, and she feels that there are no people to whom she could turn in times of trouble. Therefore, she has not shared her feelings with anyone since the breakup.

Measures

After each of the four conditions (1. low burdensomeness and low belongingness; 2. low burdensomeness and high belongingness; 3. high burdensomeness and low belongingness; 4. high burdensomeness and high belongingness), we submitted six questions to ChatGTP taken from Levi-Belz and Gamliel (28). Each question was categorized into one of four subscales: (1) Psychache (i.e., psychological pain) (Q1: “What do you think is Jane’s level of distress?”; Q2: “What do you think is Jane’s level of mental pain?”); (2) Suicidal ideation (Q3: “What is the likelihood Jane will have suicidal thoughts?”); (3) Risk of suicide attempt (Q4: “What is the likelihood Jane will attempt suicide?”; Q5: “What is the likelihood Jane will make a serious suicide attempt?”); and (4) Resilience (Q6: “How do you assess Jane’s level of resilience?”). All Six questions employed a seven-point Likert type scale, estimating intensity (Q1, Q2, Q6) and likelihood (Q3–Q5), ranging from 1 (very slight) to 7 (very high).

Scoring

ChatGPT’s performance was scored according to Levi-Belz and Gamliel (28). We then compared the ChatGPT scores to the norms of 379 participants trained in mental health professions (21% male, mean age 36 ± 8.8), 53 participants who were psychology graduate students, 266 with a master’s degree, and 60 with a doctorate. In terms of professional roles, 43 participants held certifications as supervisors in their mental health specialty, 108 were certified experts, 128 were interns, and 100 either had not begun their internship or were in professions not mandating an internship. The majority of the sample, accounting for 84%, comprised practicing mental health professionals, while the remaining individuals had previous experience in the mental health field but were currently inactive (28).

Statistical analysis

The data were presented as means ± SDs and as percentages of the first second and third evaluations. Two-sample t-tests were used to evaluate the differences between the average ChatGPT performance on the three evaluations and the norms of the mental health professionals reported by Levi-Belz and Gamliel (28).

Results

Table 1 depicts ChatGPT’s performance (mean ± SD) for all four conditions (1. low burdensomeness and low belongingness; 2. low burdensomeness and high belongingness; 3. high burdensomeness and low belongingness; 4. high burdensomeness and high belongingness) for the four dependent variables (1. psychache; 2. suicidal ideation; 3. risk of suicide attempt, and 4. resilience) compared to the norms of the health professionals reported by Levi-Belz and Gamliel (28).

TABLE 1

Table 1. Descriptive statistics of the four variables—psychache, suicidal ideation, risk of suicide attempt, and resilience—as a function of perceived burdensomeness and perceived thwarted belongingness.

Psychache

In the low burdensomeness and low belongingness condition, ChatGPT assessed the level of psychache as higher than the sample of mental health professionals (75th percentile, t = −6.3, p < 0.001). In the other conditions, no significant differences were found (percentile range 55–61). The ChatGPT mean scores and standard deviations were the same (6 ± 0) in the first three conditions.

Suicidal ideation

As can be seen in Figure 1, in the low burdensomeness and low belongingness condition, ChatGPT assessed the level of suicidal ideation as higher than the sample of mental health professionals (62nd percentile, t = −2.8, p < 0.01). In the low burdensomeness and high belongingness condition, ChatGPT assessed the level of suicidal ideation as low compared to the mental health professionals sample (22nd percentile, t = 7.00, p < 0.001). In the other two conditions, no significant differences were found (percentile range 41–48). The ChatGPT mean scores and standard deviations were the same (4 ± 0) in the first three conditions.

FIGURE 1

Figure 1. ChatGPT’s performance in all four conditions on the suicidal ideation variable, compared to the norms of mental health professionals; *p < 0.01.

Risk of suicide attempt

Figure 2 shows that in all four conditions, ChatGPT assessed the level of risk of suicide attempts significantly lower than did the mental health professionals sample (percentile range 5–22; t-test range = 6.36–12.34; p < 0.001 for all conditions). In the condition reflecting the highest level of risk (high burdensomeness and high belongingness), the ChatGPT assessment was ranked in the lowest percentile (5th). The ChatGPT mean scores and standard deviations were the same (1.5 ± 0) in the first three conditions.

FIGURE 2

Figure 2. ChatGPT’s performance in all four conditions on the risk for suicide attempt variable, compared to the norms of mental health professionals; *p < 0.001.

Resilience

In all conditions except the low burdensomeness and high belongingness condition, ChatGPT assessed the level of resilience as significantly lower compared to the mental health professionals sample (percentile range 4–35; t-test range = 3.49–13.87; p < 0.001 for low burdensomeness and high belongingness and high burdensomeness and high belongingness, p < 0.05 for low burdensomeness and low belongingness). In the low burdensomeness and high belongingness conditions, no significant differences were found (18th percentile).

Discussion

This study aimed to examine exploring identifiable limitations of ChatGPT evaluates suicidal behavior and contributing factors in comparison to the norms established by Levi-Belz and Gamliel (28) for healthcare professionals. This study offers a unique contribution by evaluating ChatGPT ability to assess suicidal risk in the context of perceived burdensomeness and thwarted belongingness. To the best of our knowledge, this issue has not been explored in previous research.

The findings show that ChatGPT consistently underestimated the likelihood of suicide attempts across all scenarios when compared to mental health professionals. Specifically, in the condition involving high burdensomeness and high belongingness, which has the highest risk level according to the Interpersonal Theory of Suicide (ITS), ChatGPT ranked the risk in the lowest percentile (5th). The absolute value of suicidal behavior was greater in this condition than in the other three, suggesting that ChatGPT takes perceived burdensomeness and thwarted belongingness into account to some extent, albeit to a lesser degree than what is suggested by theory and clinical experience (23, 24, 28). While the assessment of mental health professionals was influenced by the presence of either perceived burdensomeness or thwarted belongingness, ChatGPT’s assessment was only affected when both factors were present, and even then, only to a minimal extent.

In most conditions, ChatGPT tended to rate mental resilience lower than the norms. In addition, the psychache and suicidal ideation variables were pretty similar to the norms in most of the conditions. These results indicate that ChatGPT’s assessment of risk of suicide attempt is less influenced by factors such as resilience (28–30). Based on the findings, we can deduce that ChatGPT’s overall evaluation might not fully encompass the complexities of assessing suicidal risk, and its capacity to identify potential risk and protective factors may deviate from established clinical expertise. While chatbots show promise in improving the insufficient predictive capability of clinical assessments and suicide risk questionnaires (21, 22), it is evident that, within the context of this study, they are unable to effectively fulfill this role.

These results differ from those of contemporary artificial intelligence (AI) models for evaluating suicidal risk (31, 32). Consequently, the study emphasizes the importance of exercising prudence and further refining AI models for appraising suicidal risk. The purpose of this study was to explore the potential of artificial intelligence (AI) in evaluating mental health risks beyond its theoretical proficiency and semantic knowledge, as discussed in previous literature (33, 34). Specifically, ChatGPT shows major potential for contributing to assessment processes in mental health (35), among other things due to its vast knowledge, high accessibility on any phone or computer, and ability to reduce the feeling of stigma and shame associated with psychological or psychiatric settings. Previous research on the application of AI in mental health has focused primarily on its potential to aid in technical tasks, thereby reducing the need for clinical interactions. This approach suggests that AI technologies can assist in completing non-personalized tasks, thus freeing up clinicians’ time to focus on delivering more empathic care and “humanizing” their practice, as the quality of mental health care often relies on the clinician-patient relationship (36). Scholars have proposed several potential applications of AI in the mental health field, including assisting clinicians with time-consuming tasks such as updating medical records, improving diagnostic and prognostic accuracy, promoting understanding of mental illnesses mechanisms, and enhancing treatment based on biological feedback (37–40). The findings of the current study suggest that using ChatGPT to evaluate suicide risk through vignettes or as a tool for improved decision-making among clinicians or gatekeepers is not in line with clinical experience and is not recommended. The implication is that ChatGPT should be used to direct individuals who pose questions about suicide to receive mental health treatment or assistance.

The present study has several limitations. First, it focused on the March 13 version of ChatGPT. More recent versions have since been released, and forthcoming studies should investigate this issue using these updates. Second, the study focused on a vignette featuring a female participant, whereas the incidence of suicide deaths is higher among males. Therefore, we recommend conducting additional studies to investigate vignettes featuring various demographic groups, including male participants, psychiatric patients, adolescents, and older individuals. Third, we compared the ChatGPT data with a sample of data from mental health professionals in Israel. Therefore, we recommend investigating the appraisal of therapists from other countries to assess cross-cultural differences. Fourth limitation is that the research on which we were based (28) did not report the degree of reliability between the judges. Accordingly, an assessment of the degree of variability in human responses can be roughly estimated through standard deviations alone. Lastly, This research delved into the intricate realm of suicide risk assessment using artificial intelligence. However, in order to establish a more expansive understanding, further studies are necessary. These subsequent investigations should explore supplementary risk factors, incorporate additional large language models, analyze data at various time points, and compare findings with a wider range of clinical samples.

Conclusion

The objective of this research was to investigate the extent of ChatGPT’s ability to evaluate suicide risk and to identify few factors compared to the assessment of mental health professionals and to explore how this assessment is influenced by perceived burdensomeness and thwarted belongingness. The findings revealed that in all scenarios ChatGPT consistently assessed the risk of suicide attempt as lower than did mental health professionals. The results of this study imply that ChatGPT’s evaluation of suicidal ideation and suicide attempt risk may be less affected by elements such as resilience, perceived burdensomeness, and thwarted belongingness. The finding that ChatGPT underestimates the risk of suicide attempts in comparison to mental health care professionals, especially in the most severe condition, is a significant cause for concern. It implies that gatekeepers, patients, or even mental health professionals who are considering using ChatGPT for evaluating suicidal risk may receive an inaccurate assessment that underestimates the actual risk. Despite the theoretical possibility that ChatGPT’s assessments are more precise, it is incumbent upon mental health professionals to prioritize human life and treat the chat assessments in this domain as unprofessional until further evidence becomes available.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

ZE: conception and design of the study, acquisition and analysis of data, and drafting of a significant portion of the manuscript and figures. IL: conception and design of the study, acquisition and analysis of data, and drafting of a significant portion of the manuscript. All authors contributed to the article and approved the submitted version.

Funding

This research was supported by Oranim College (no. 5100260).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Kirmani, AR. Artificial intelligence-enabled science poetry. ACS Energy Letters. (2022) 8:574–76.

Google Scholar

2. Liu, Y, Deng, G, Xu, Z, Li, Y, Zheng, Y, Zhang, Y, et al. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv Preprint. (2023) arXiv:2305.13860.

Google Scholar

3. Emenike, ME, and Emenike, BU. Was this title generated by ChatGPT? Considerations for artificial intelligence text-generation software programs for chemists and chemistry educators. J Chem Educ. (2023) 100:1413–8. doi: 10.1021/acs.jchemed.3c00063

CrossRef Full Text | Google Scholar

4. Khalil, M., and Er, E. Will ChatGPT get you caught? Rethinking plagiarism detection. (2023). arXiv [Preprint]. arXiv:2302.04335.

Google Scholar

5. Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., et al. How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. (2023). arXiv [Preprint] arXiv:2301.07597.

Google Scholar

6. Lew, B, Lester, D, Mustapha, FI, Yip, P, Chen, Y, Panirselvam, RR, et al. Decriminalizing suicide attempt in the 21st century: an examination of suicide rates in countries that penalize suicide, a critical review. BMC Psychiatry. (2022) 22:424. doi: 10.1186/s12888-022-04060-5

CrossRef Full Text | Google Scholar

7. Zalsman, G. Suicide: epidemiology, etiology, treatment and prevention. Harefuah. (2019) 158:468–72.

PubMed Abstract | Google Scholar

8. Stone, DM, Jones, CM, and Mack, KA. Changes in suicide rates—United States, 2018–2019. Morb Mortal Wkly Rep. (2021) 70:261–8. doi: 10.15585/mmwr.mm7008a1

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Bachmann, S. Epidemiology of suicide and the psychiatric perspective. Int J Environ Res Public Health. (2018) 15:1425. doi: 10.3390/ijerph15071425

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Gvion, Y, and Apter, A. Aggression, impulsivity, and suicide behavior: a review of the literature. Arch Suicide Res. (2011) 15:93–112. doi: 10.1080/13811118.2011.565265

CrossRef Full Text | Google Scholar

11. Chung, DT, Ryan, CJ, Hadzi-Pavlovic, D, Singh, SP, Stanton, C, and Large, MM. Suicide rates after discharge from psychiatric facilities: a systematic review and meta-analysis. JAMA Psychiat. (2017) 74:694–702. doi: 10.1001/jamapsychiatry.2017.1044

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Zalsman, G, Hawton, K, Wasserman, D, van Heeringen, K, Arensman, E, Sarchiapone, M, et al. Suicide prevention strategies revisited: 10-year systematic review. Lancet Psychiatry. (2016) 3:646–59. doi: 10.1016/S2215-0366(16)30030-X

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Bolton, JM, Gunnell, D, and Turecki, G. Suicide risk assessment and intervention in people with mental illness. BMJ. (2015) 351:4978. doi: 10.1136/bmj.h4978

CrossRef Full Text | Google Scholar

14. Wasserman, D, Rihmer, Z, Rujescu, D, Sarchiapone, M, Sokolowski, M, Titelman, D, et al. The european psychiatric association (EPA) guidance on suicide treatment and prevention. Eur Psychiatry. (2012) 27:129–41. doi: 10.1016/j.eurpsy.2011.06.003

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Burnette, C, Ramchand, R, and Ayer, L. Gatekeeper training for suicide prevention: a theoretical model and review of the empirical literature. Rand Health Q. (2015) 5:16. doi: 10.7249/j.ctt13x1ftq

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Stickl Haugen, J., Sutter, C. C., Tinstman Jones, J. L., and Campbell, L. O. Teachers as youth suicide prevention gatekeepers: an examination of suicide prevention training and exposure to students at risk of suicide. (2022). In: Paper Presented at the Child & Youth Care Forum, 52. 583-601.

Google Scholar

17. Morton, M, Wang, S, Tse, K, Chung, C, Bergmans, Y, Ceniti, A, et al. Gatekeeper training for friends and family of individuals at risk of suicide: a systematic review. J Community Psychol. (2021) 49:1838–71. doi: 10.1002/jcop.22624

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Holmes, G, Clacy, A, Hermens, DF, and Lagopoulos, J. The long-term efficacy of suicide prevention gatekeeper training: a systematic review. Arch Suicide Res. (2021) 25:177–207. doi: 10.1080/13811118.2019.1690608

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Levkovich, I, and Vigdor, I. How school counsellors cope with suicide attempts among adolescents—a qualitative study in Israel. J Psychol Couns Sch. (2021) 31:63–75. doi: 10.1017/jgc.2020.14

CrossRef Full Text | Google Scholar

20. Torok, M, Calear, AL, Smart, A, Nicolopoulos, A, and Wong, Q. Preventing adolescent suicide: a systematic review of the effectiveness and change mechanisms of suicide prevention gatekeeping training programs for teachers and parents. J Adolesc. (2019) 73:100–12. doi: 10.1016/j.adolescence.2019.04.005

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Franklin, JC, Ribeiro, JD, Fox, KR, Bentley, KH, Kleiman, EM, Huang, X, et al. Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull. (2017) 143:187–232. doi: 10.1037/bul0000084

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Vera-Varela, C, Manrique Mirón, PC, Barrigón, ML, Álvarez-García, R, Portillo, P, Chamorro, J, et al. Low level of agreement between self-report and clinical assessment of passive suicidal ideation. Arch Suicide Res. (2022) 26:1895–910. doi: 10.1080/13811118.2021.1945984

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Joiner, TE. Why people die by suicide. Cambridge, MA: Harvard University Press (2005).

Google Scholar

24. Van Orden, KA, Witte, TK, Gordon, KH, Bender, TW, and Joiner, TE. Suicidal desire and the capability for suicide: tests of the interpersonal-psychological theory of suicidal behavior among adults. J Consult Clin Psychol. (2008) 76:72–83. doi: 10.1037/0022-006X.76.1.72

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Joiner, TE, Van Orden, KA, Witte, TK, Selby, EA, Ribeiro, JD, Lewis, R, et al. Main predictions of the interpersonal–psychological theory of suicidal behavior: empirical tests in two samples of young adults. J Abnorm Psychol. (2009) 118:634–46. doi: 10.1037/a0016500

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Shneidman, E. S. A psychological approach to suicide. Paper presented at the American Psychological Association convention, 1986, Washington, DC, US; this chapter is based upon one of the 1986 master lectures that were presented at the aforementioned convention. (1987).

Google Scholar

27. Bresin, K, Sand, E, and Gordon, KH. Non-suicidal self-injury from the observer’s perspective: A vignette study. Arch Suicide Res. (2013) 17:185–95.

Google Scholar

28. Levi-Belz, Y, and Gamliel, E. The effect of perceived burdensomeness and thwarted belongingness on therapists' assessment of patients' suicide risk. Psychother Res. (2016) 26:436–45. doi: 10.1080/10503307.2015.1013161

CrossRef Full Text | Google Scholar

29. Gautam, A, and Nagle, YK. Moderating effect of resilience: thwarted belongingness and perceived burdensomeness as predictors of suicidal ideation. J Psychosoc Res. (2016) 11:459.

Google Scholar

30. Kinory, D, Aisenberg, D, and Levi-Belz, Y. The cost of being apart: the contribution of perceived burdensomeness and thwarted belongingness to suicide ideation among Israeli older adults. J Nerv Ment Dis. (2020) 208:663–70. doi: 10.1097/NMD.0000000000001198

CrossRef Full Text | Google Scholar

31. Bernert, RA, Hilberg, AM, Melia, R, Kim, JP, Shah, NH, and Abnousi, F. Artificial intelligence and suicide prevention: a systematic review of machine learning investigations. Int J Environ Res Public Health. (2020) 17:5929. doi: 10.3390/ijerph17165929

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Fonseka, TM, Bhat, V, and Kennedy, SH. The utility of artificial intelligence in suicide risk prediction and the management of suicidal behaviors. Austral New Z J Psychiatry. (2019) 53:954–64. doi: 10.1177/0004867419864428

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Kung, TH, Cheatham, M, Medenilla, A, Sillos, C, De Leon, L, Elepaño, C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. (2023) 2:e0000198. doi: 10.1371/journal.pdig.0000198

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Rudolph, J, Tan, S, and Tan, S. ChatGPT: bullshit spewer or the end of traditional assessments in higher education? J Appl Learn Teach. (2023) 6:342–63. doi: 10.37074/jalt.2023.6.1.9

CrossRef Full Text | Google Scholar

35. Elyoseph, Z, Hadar Shoval, D, Asraf, K, and Lvovsky, M. ChatGPT outperforms humans in emotional awareness evaluations. Front Psychol. (2023) 14:2116. doi: 10.3389/fpsyg.2023.1199058

CrossRef Full Text | Google Scholar

36. Topol, E. Deep medicine: How artificial intelligence can make healthcare human again. UK: Hachette (2019).

Google Scholar

37. Braun, U, Schaefer, A, Betzel, RF, Tost, H, Meyer-Lindenberg, A, and Bassett, DS. From maps to multi-dimensional network mechanisms of mental disorders. Neuron. (2018) 97:14–31. doi: 10.1016/j.neuron.2017.11.007

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Bzdok, D, and Meyer-Lindenberg, A. Machine learning for precision psychiatry: opportunities and challenges. Biol Psychiatry Cogn Neurosci Neuroimag. (2018) 3:223–30. doi: 10.1016/j.bpsc.2017.11.007

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Doraiswamy, PM, Blease, C, and Bodner, K. Artificial intelligence and the future of psychiatry: insights from a global physician survey. Artif Intell Med. (2020) 102:101753. doi: 10.1016/j.artmed.2019.101753

PubMed Abstract | CrossRef Full Text | Google Scholar

40. Lee, EE, Torous, J, De Choudhury, M, Depp, CA, Graham, SA, Kim, HC, et al. Artificial intelligence for mental health care: clinical applications, barriers, facilitators, and artificial wisdom. Biol Psychiatry Cogn Neurosci Neuroimag. (2021) 6:856–64. doi: 10.1016/j.bpsc.2021.02.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: artificial intelligence, ChatGPT, diagnosis, psychological assessment, suicide risk, risk assessment, text vignette

Citation: Elyoseph Z and Levkovich I (2023) Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment. Front. Psychiatry. 14:1213141. doi: 10.3389/fpsyt.2023.1213141

Received: 28 April 2023; Accepted: 19 July 2023;
Published: 01 August 2023.

Edited by:

Beth Krone, Icahn School of Medicine at Mount Sinai, United States

Reviewed by:

M. David Rudd, University of Memphis, United States
Amna Mohyud Din Chaudhary, Case Western Reserve University, United States

Copyright © 2023 Elyoseph and Levkovich. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Zohar Elyoseph, zohare@yvc.ac.il

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.