The Association for Surgical Education
Assessment of medical student clinical reasoning by “lay” vs physician raters: inter-rater reliability using a scoring guide in a multidisciplinary objective structured clinical examination

https://doi.org/10.1016/j.amjsurg.2011.08.003Get rights and content

Abstract

Background

To determine whether a “lay” rater could assess clinical reasoning, interrater reliability was measured between physician and lay raters of patient notes written by medical students as part of an 8-station objective structured clinical examination.

Methods

Seventy-five notes were rated on core elements of clinical reasoning by physician and lay raters independently, using a scoring guide developed by physician consensus. Twenty-five notes were rerated by a 2nd physician rater as an expert control. Kappa statistics and simple percentage agreement were calculated in 3 areas: evidence for and against each diagnosis and diagnostic workup.

Results

Agreement between physician and lay raters for the top diagnosis was as follows: supporting evidence, 89% (κ = .72); evidence against, 89% (κ = .81); and diagnostic workup, 79% (κ = .58). Physician rater agreement was 83% (κ = .59), 92% (κ = .87), and 96% (κ = .87), respectively.

Conclusions

Using a comprehensive scoring guide, interrater reliability for physician and lay raters was comparable with reliability between 2 expert physician raters.

Section snippets

Methods

A comprehensive scoring guide was developed to assess medical student patient notes for a multidisciplinary abdominal pain case. The case, which is briefly presented in Figure 1, is a 42-year-old woman with acute left lower quadrant abdominal pain. This case was specifically designed to have several possible differential diagnoses, ranging across several different clinical specialties, and was used as 1 of 8 cases in a high-stakes (passing grade required for graduation) OSCE administered to

Results

Cronbach's α coefficient was low for both the physician rater (.54) and the lay rater (.58), suggesting that the individual domains of clinical reasoning may be relatively independent. Agreement between the physician and lay rater in the initial 25-note sample was as follows: supporting evidence, 84% (κ = .69); evidence against, 71% (κ = .62); and diagnostic workup, 73% (κ = .69). After additional training and consensus development, agreement improved substantially for evidence against (87%; κ

Comments

Clinical reasoning is a complex entity that is not easily operationalized or assessed. Performance on the USMLE Step 2 Clinical Knowledge section has been shown to have minimal redundancy with performance on the Step 2 Clinical Skills section; therefore, it is essential that both components be adequately assessed.13 The OSCE-based USMLE Step 2 Clinical Skills section evaluates competency in the following categories: integrated clinical encounter (including data gathering from history and

Conclusions

The findings of this study suggest that with adequate training, lay raters may act as examiners in the assessment of the patient note clinical reasoning score in a multidisciplinary, high-stakes OSCE.

References (18)

  • S.R. Simon et al.

    The relationship between second-year medical students' OSCE scores and USMLE Step 2 scores

    J Eval Clin Pract

    (2007)
  • A. Cuschieri et al.

    A new approach to a final examination in surgeryUse of the objective structured clinical examination

    Ann R Coll Surg Engl

    (1979)
  • R.M. Harden et al.

    Assessment of clinical competence using an objective structured clinical examination (OSCE)

    Med Educ

    (1979)
  • R.M. Harden et al.

    Assessment of clinical competence using objective structured examination

    Br Med J

    (1975)
  • J. Wallenstein et al.

    A core competency-based objective structured clinical examination (OSCE) can predict future resident performance

    Acad Emerg Med

    (2010)
  • E. Friedman et al.

    Taking note of the perceived value and impact of medical student chart documentation on education and patient care

    Acad Med

    (2010)
  • G. Regehr et al.

    Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination

    Acad Med

    (1998)
  • C. Mertler

    Designing scoring rubrics for your classroom

    Pract Assess Res Eval

    (2001)
  • M.F. Ben-David et al.

    Issues of validity and reliability concerning who scores the post-encounter patient-progress note

    Acad Med

    (1997)
There are more references available in the full text version of this article.

Cited by (11)

  • (En)trust me: Validating an assessment rubric for documenting clinical encounters during a surgery clerkship clinical skills exam

    2020, American Journal of Surgery
    Citation Excerpt :

    There are various types of assessment tools that can take on holistic or analytic scoring forms and there has been some literature to support validity and reliability differences based on the type of scoring system utilized.18 Analytic scoring systems have been found to produce more reliable results.18 In the development of our rubric, we sought to include both analytic and holistic elements.

  • Does objective structured clinical examinations score reflect the clinical reasoning ability of medical students?

    2015, American Journal of the Medical Sciences
    Citation Excerpt :

    Although global rating by experts is regarded as the “gold standard” for clinical reasoning assessment,23 here the authors used analytic scoring to evaluate clinical reasoning ability. Compared with global rating, analytic scoring is known to be an effective method of giving feedback and to have increased reliability over global ratings.4 Furthermore, because the analytic score was rated by physicians, the scoring system to evaluate clinical reasoning might be more reliable than other analytic scoring systems.

View all citing articles on Scopus
View full text