Featured Article
Use of artificial intelligence for gender bias analysis in letters of recommendation for general surgery residency candidates

https://doi.org/10.1016/j.amjsurg.2021.09.034Get rights and content

Highlights

  • Letters of recommendation (LoR) are important in surgical resident selection.

  • Gender bias can implicitly alter LoR strength.

  • LoR readers can use artificial intelligence to improve gender bias detection.

  • Implicit bias detection can provide deeper meaning and guide resident selection.

Abstract

Background

Letters of recommendation (LoRs) play an important role in resident selection. Author language varies implicitly toward male and female applicants. We examined gender bias in LoRs written for surgical residency candidates across three decades at one institution.

Methods

Retrospective analysis of LoRs written for general surgery residency candidates between 1980 and 2011 using artificial intelligence (AI) to conduct natural language processing (NLP) and sentiment analysis, and computer-based algorithms to detect gender bias. Applicants were grouped by scaled clerkship grades and USMLE scores. Data were analyzed among groups with t-tests, ANOVA, and non-parametric tests, as appropriate.

Results

A total of 611 LoRs were analyzed for 171 applicants (16.4% female), and 95.3% of letter authors were male. Scaled USMLE scores and clerkship grades (SCG) were similar for both genders (p > 0.05 for both). Average word count for all letters was 290 words and was not significantly different between genders (p = 0.18). LoRs written before 2000 were significantly shorter than those written after, among applicants of both genders (female p = 0.004; male p < 0.001). Gender bias analysis of female LoRs revealed more gendered wording compared to male LoRs (p = 0.04) and was most prominent among females with lower SCG (9.5 vs 5.1, p = 0.01). Sentiment analysis revealed male LoRs with female authors had significantly more positive sentiment compared to female LoRs (p = 0.02), and males with higher SCG had more positive sentiment compared to those with lower SCG (9.4 vs 8.2, p = 0.03). NLP detected more “fear” in male LoRs with lower SCGs (0.11 vs 0.09, p = 0.02). Female LoRs with higher SCGs had more positive sentiment (0.78 vs 0.83, p = 0.03) and “joy” (0.60 vs 0.63, p = 0.02), although those written before 2000 had less “joy” (0.5 vs 0.63, p = 0.006).

Conclusion

AI and computer-based algorithms detected linguistic differences and gender bias in LoRs written for general surgery residency applicants, even following stratification by clerkship grades and when analyzed by decade.

Introduction

Letters of recommendation (LoRs) are valuable assets in general surgery (GS) resident selection, ranked as second in importance by general surgery program directors, only behind United States Medical Licensing Examination (USMLE) scores as determinants for interviewing applicants.1 LoRs complement academic success in clerkship grades and USMLE scores by accentuating non-cognitive factors and holistically highlighting candidate communication skills, work ethic, teamwork, technical performance, and personal characteristics, among others.2 LoR authors discuss attributes elucidating applicant traits readers otherwise might not know, engaging in advocacy while remaining accountable to their audience.

Despite their importance, LoRs unfortunately may not advocate equitably for all applicants. Analysis of large textual corpuses by Mondorf and Argamon et al., in 2002 and 2003, respectively, showed numerous gender-specific differences in underlying lexical and syntactic properties that implicitly communicate gender bias.3,4 While these have been reported repeatedly, their tacit nature often obscures audience detection.5,6 In medical education, several studies implementing different analytic techniques have elucidated various implicit social biases in LoRs, including race- and gender-based biases.7, 8, 9 Implicit gender bias in GS LoRs have been described at different levels of training.10, 11, 12, 13

The first step in mitigating gender bias is through detection. Natural Language Processing (NLP) is a rapidly deployable, malleable tool that utilizes codable algorithms for bias detection.14 NLP can be deployed through computer-based algorithms (CBAs) or cloud-based networks to perform subtasks, such as analyzing sentiment, emotion, tone, or personality within text to detect implicit biases.15, 16, 17 Of the cloud-based systems, artificial intelligence (AI) has been adopted by several industries due to automation, speed, adaptability, and machine learning capabilities.18, 19, 20, 21 To date, no published study has examined gender bias analysis capabilities of AI for GS LoRs. We describe our implementation of AI and CBAs to analyze NLP variables and gender bias in GS LoRs across three decades at one academic institution, examining changes in each over time by LoR, applicant, and author traits. We hypothesized that over time, gender bias in GS LoRs would decrease, with a corresponding increase in positive sentiment toward females, signifying gradual awareness of gender bias and improved opinion of GS residency candidates.

Section snippets

Materials and methods

Following institutional review board approval, we performed a retrospective analysis of 611 LoRs written for 171 categorical GS residency applicants who successfully matched at one tertiary academic institution between 1980 and 2011. Study data were collected in 2019 and only went through 2011 so as to avoid inclusion of data for residents currently matriculated into the program. LoRs were scanned from print copies then converted to text files with headings, greetings, and signatures removed

Applicant and author traits

APs 1, 2, and 3 had 306, 154, and 151 LoRs, respectively, written for 89, 40, and 42 applicants, respectively. Mean number of LoRs per applicant was 3.57 (±0.95) and remained similar over time (p = 0.11). SCG were reported by 93% of applicants in the study, while 64.9% had either a three-digit USMLE score or NBME Part I examination percentile to comprise an SBQ. SCG distribution did not differ significantly across AP (p = 0.30) or by applicant gender (p = 0.45), nor by applicant gender within

Discussion

Our study is the first to examine surgical residency candidate LoRs with AI and a gender bias algorithm. Both sentiment algorithms detected improved author opinion of applicants over time and with increasing LoR length as indicated by higher sentiment. Furthermore, our data suggest that gender bias in LoRs for both genders contained more overall female bias in earlier APs, with a gradual shift to male bias. This finding has been seen in other realms of graduate-level education but has not been

Conclusions

General surgery residency LoRs have made significant strides towards reducing female bias over the past three decades. However, the underlying maladaptation to male bias as the “norm” for descriptive success is problematic and should be addressed. More research is required to determine how to best mitigate implicit biases in the resident selection process. With appropriate datasets, time, and devoted analytics, gender bias detection can improve and be minimized, while simultaneously enabling

References (39)

  • B.V. Chapman et al.

    Linguistic Biases in Letters of Recommendation for Radiation Oncology Residency Applicants from 2015 to 2019

    (2020)
  • F. Lin et al.

    Gender-based differences in letters of recommendation written for ophthalmology residency applicants

    BMC Med Educ

    (2019)
  • L.J. Grimm et al.

    Gender and racial bias in radiology residency letters of recommendation

    J Am Coll Radiol

    (2020)
  • F.E. Turrentine et al.

    Influence of gender on surgical residency applicants' recommendation letters

    J Am Coll Surg

    (2019)
  • E. Dinan et al.

    Multi-dimensional gender bias classification

  • Bing Liu

    Sentiment Analysis and Opinion Mining

    (May 2012)
  • Nanli, Z., Ping, Z., Weiguo, L., Meng, C.: Sentiment analysis: a literature review. In: 2012 International Symposium on...
  • Kyumin Lee et al.

    Who will retweet this?: automatically identifying and engaging strangers on twitter to spread information

  • H. Liang et al.

    Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence

    Nat Med

    (2019)
  • Cited by (16)

    • Examining Implicit Bias Differences in Pediatric Surgical Fellowship Letters of Recommendation Using Natural Language Processing

      2023, Journal of Surgical Education
      Citation Excerpt :

      Additionally, differences in LORs have been identified by gender of the letter writer regardless of applicant gender.8 Previous work has identified race and gender differences in various specialties, including general surgery, urology, transplant surgery, vascular surgery and others.2,5-12 Linguistic differences were found to be implicit but may influence recruitment of a diverse workforce into surgical training programs.

    • Recruitment of the Next Generation of Diverse Hand Surgeons

      2023, Hand Clinics
      Citation Excerpt :

      Even in Plastic Surgery where standardized letters of recommendation have become more common, studies show that gender and minority status tend to predict poorer letters.27 Similar data also exist for applicants to General Surgery residency.28 To combat this problem, it may be important for schools and programs to consider letters of recommendation as potential sources of bias and not to weight them as heavily as the more unbiased elements of the application, such as transcripts and publications.

    • Writing recommendation letters: The discourse of evaluation in academic settings

      2024, Writing Recommendation Letters: The Discourse of Evaluation in Academic Settings
    View all citing articles on Scopus
    View full text